Skip to main content

A Median-Based Consensus Rule for Distance Exponent Selection in the Framework of Intelligent and Weighted Minkowski Clustering

Part of the Studies in Classification, Data Analysis, and Knowledge Organization book series (STUDIES CLASS)

Abstract

The intelligent Minkowski and weighted Minkowski K-means are recently developed effective clustering algorithms capable of computing feature weights. Their cluster-specific weights follow the intuitive idea that a feature with a low dispersion in a specific cluster should have a greater weight in this cluster than a feature with a high dispersion. The final clustering provided by these techniques obviously depends on the selection of the Minkowski exponent. The median-based central consensus rule we introduce in this paper allows one to select an optimal value of the Minkowski exponent. Our rule takes into account the values of the Adjusted Rand Index (ARI) between clustering solutions obtained for different Minkowski exponents and selects the clustering that provides the highest average value of ARI. Our simulations, carried out with real and synthetic data, show that the proposed median-based consensus procedure usually outperforms clustering strategies based on the selection of the highest value of the Silhouette or Calinski–Harabasz cluster validity indices.

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-319-55723-6_8
  • Chapter length: 14 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   119.00
Price excludes VAT (USA)
  • ISBN: 978-3-319-55723-6
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   159.99
Price excludes VAT (USA)
Fig. 1
Fig. 2

References

  1. Arbelaitz, O., Gurrutxaga, I., Muguerza, J., Pérez, J.M., Perona, I.: An extensive comparative study of cluster validity indices. Pattern Recogn. 46, 243–256 (2012)

    CrossRef  Google Scholar 

  2. Ball, G.H., Hall, D.J.: A clustering technique for summarizing multivariate data. Behav. Sci. 12, 153–155 (1967)

    CrossRef  Google Scholar 

  3. Caliński, T., Harabasz, J.: A dendrite method for cluster analysis. Commun. Stat.-Theory Methods 3, 1–27 (1974)

    MathSciNet  CrossRef  MATH  Google Scholar 

  4. Chan, E.Y., Ching, W.K., Ng, M.K., Huang, J.Z.: An optimization algorithm for clustering using weighted dissimilarity measures. Pattern Recogn. 37, 943–952 (2004)

    CrossRef  MATH  Google Scholar 

  5. de Amorim, R.C., Mirkin, B.: Minkowski metric, feature weighting and anomalous cluster initializing in K-means clustering. Pattern Recogn. 45, 1061–1075 (2012)

    CrossRef  Google Scholar 

  6. Field, A.: Discovering Statistics Using SPSS. SAGE Publications, New Delhi (2005)

    MATH  Google Scholar 

  7. Huang, J.Z., Ng, M.K., Rong, H., Li, Z.: Automated variable weighting in k-means type clustering. IEEE Trans. Pattern Anal. Mach. Intell. 27, 657–668 (2005)

    CrossRef  Google Scholar 

  8. Hubert, L., Arabie, P.: Comparing partitions. J. Classif. 2, 193–218 (1985)

    CrossRef  MATH  Google Scholar 

  9. Jain, A.K.: 50 years beyond K-means. Pattern Recogn. Lett. 31, 651–666 (2010)

    CrossRef  Google Scholar 

  10. Ji, J., Bai, T., Zhou, C., Ma, C., Wang, Z.: An improved k-prototypes clustering algorithm for mixed numeric and categorical data. Neurocomputing 120, 590–596 (2013)

    CrossRef  Google Scholar 

  11. Jones, E., Oliphant, T., Peterson, P., et al.: SciPy: Open Source Scientific Tools for Python. R Foundation for Statistical Computing, Vienna (2011). Available via DIALOG

    Google Scholar 

  12. Lichman, M.: UCI Machine Learning Repository. School of Information and Computer Sciences, University of California, Irvine (2013). Available via DIALOG

    Google Scholar 

  13. MacQueen, J.: Some methods for classification and analysis of multivariate observations. In: Broy, M. (ed.) Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, pp. 281–297. University of California Press, Berkeley, CA (1967)

    Google Scholar 

  14. Makarenkov, V., Legendre, P.: Optimal variable weighting for ultrametric and additive trees and K-Means partitioning. J. Classif. 169, 245–271 (2001)

    MathSciNet  MATH  Google Scholar 

  15. MATLAB: MATLAB:2010. The MathWorks Inc., Natick, MA (2010)

    Google Scholar 

  16. Milligan, G.W., Cooper, M.C.: An examination of procedures for determining the number of clusters in a data set. Psychometrika 50, 159–179 (1985)

    CrossRef  Google Scholar 

  17. Mirkin, B.: Clustering: A Data Recovery Approach. CRC Press, London (2012)

    CrossRef  MATH  Google Scholar 

  18. Murtagh, F.: Complexities of hierarchic clustering algorithms: state of the art. Comput. Stat. 1, 101–113 (1984)

    MATH  Google Scholar 

  19. Murtagh, F., Contreras, P.: Methods of hierarchical clustering (2011). arXiv preprint arXiv:1105.0121

    Google Scholar 

  20. Pal, S.K., Majumder, D.D.: Fuzzy sets and decision making approaches in vowel and speaker recognition. IEEE Trans. Syst. Man Cyber. 7, 625–629 (1977)

    CrossRef  MATH  Google Scholar 

  21. Pollard, K.S., Van Der Laan, M.J.: A method to identify significant clusters in gene expression data. Bepress, pp. 318–325 (2002)

    Google Scholar 

  22. R Core Team: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna (2013). Available via DIALOG

    Google Scholar 

  23. Rousseeuw, P.J.: Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 20, 53–65 (1987)

    CrossRef  MATH  Google Scholar 

  24. Steinley, D.: K-means: a half-century synthesis. Br. J. Math. Stat. Psychol. 59, 1–34 (2006)

    MathSciNet  CrossRef  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Vladimir Makarenkov .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

de Amorim, R.C., Tahiri, N., Mirkin, B., Makarenkov, V. (2017). A Median-Based Consensus Rule for Distance Exponent Selection in the Framework of Intelligent and Weighted Minkowski Clustering. In: Palumbo, F., Montanari, A., Vichi, M. (eds) Data Science . Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Cham. https://doi.org/10.1007/978-3-319-55723-6_8

Download citation