Hierarchical Clustering via Penalty-Based Aggregation and the Genie Approach

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9880)

Abstract

The paper discusses a generalization of the nearest centroid hierarchical clustering algorithm. A first extension deals with the incorporation of generic distance-based penalty minimizers instead of the classical aggregation by means of centroids. Due to that the presented algorithm can be applied in spaces equipped with an arbitrary dissimilarity measure (images, DNA sequences, etc.). Secondly, a correction preventing the formation of clusters of too highly unbalanced sizes is applied: just like in the recently introduced Genie approach, which extends the single linkage scheme, the new method averts a chosen inequity measure (e.g., the Gini-, de Vergottini-, or Bonferroni-index) of cluster sizes from raising above a predefined threshold. Numerous benchmarks indicate that the introduction of such a correction increases the quality of the resulting clusterings significantly.

Keywords

Hierarchical clustering Aggregation Centroid Gini-index Genie algorithm 

References

  1. 1.
    Anderberg, M.R.: Cluster Analysis for Applications. Academic Press, New York (1973)MATHGoogle Scholar
  2. 2.
    Aristondo, O., García-Lapresta, J., de la Vega, C.L., Pereira, R.M.: Classical inequality indices, welfare and illfare functions, and the dual decomposition. Fuzzy Sets Syst. 228, 114–136 (2013)MathSciNetCrossRefMATHGoogle Scholar
  3. 3.
    Beliakov, G., Bustince, H., Calvo, T.: A Practical Guide to Averaging Functions. Springer, Heidelberg (2016)CrossRefGoogle Scholar
  4. 4.
    Bortot, S., Marques Pereira, R.: On a new poverty measure constructed from the exponential mean. In: Proceedings of IFSA/EUSFLAT’15, pp. 333–340. Atlantis Press (2015)Google Scholar
  5. 5.
    Cena, A., Gagolewski, M.: Fuzzy K-minpen clustering and K-nearest-minpen classification procedures incorporating generic distance-based penalty minimizers. In: Carvalho, J.P., Lesot, M.-J., Kaymak, U., Vieira, S., Bouchon-Meunier, B., Yager, R.R. (eds.) IPMU 2016. CCIS, vol. 611, pp. 445–456. Springer, Heidelberg (2016). doi:10.1007/978-3-319-40581-0_36 CrossRefGoogle Scholar
  6. 6.
    Deza, M.M., Deza, E.: Encyclopedia of Distances. Springer, Heidelberg (2013)CrossRefMATHGoogle Scholar
  7. 7.
    Gagolewski, M.: Data Fusion: Theory, Methods, and Applications. Institute of Computer Science, Polish Academy of Sciences, Warsaw, Poland (2015)Google Scholar
  8. 8.
    Gagolewski, M., Bartoszuk, M., Cena, A.: Genie: a new, fast, and outlier-resistant hierarchical clustering algorithm. Inf. Sci. 363, 8–23 (2016)CrossRefGoogle Scholar
  9. 9.
    García-Lapresta, J., Lasso de la Vega, C., Marques Pereira, R., Urrutia, A.: A new class of fuzzy poverty measures. In: Proceedings of IFSA/EUSFLAT 2015, pp. 1140–1146. Atlantis Press (2015)Google Scholar
  10. 10.
    Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer, Heidelberg (2013)MATHGoogle Scholar
  11. 11.
    Lance, G.N., Williams, W.T.: A general theory of classificatory sorting strategies. Comput. J. 9(4), 373–380 (1967)CrossRefGoogle Scholar
  12. 12.
    Legendre, P., Legendre, L.: Numerical Ecology. Elsevier Science BV, Amsterdam (2003)MATHGoogle Scholar
  13. 13.
    Müllner, D.: Modern hierarchical, agglomerative clustering algorithms. arXiv:1109.2378 [stat.ML] (2011)
  14. 14.
    Olson, C.F.: Parallel algorithms for hierarchical clustering. Parallel Comput. 21, 1313–1325 (1995)MathSciNetCrossRefMATHGoogle Scholar
  15. 15.
    R Development Core Team: \({\sf {R}}\): A Language and Environment for Statistical Computing. \({\sf {R}}\) Foundation for Statistical Computing, Vienna (2016). http://www.R-project.org

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  • Marek Gagolewski
    • 1
    • 2
  • Anna Cena
    • 1
  • Maciej Bartoszuk
    • 2
  1. 1.Systems Research InstitutePolish Academy of SciencesWarsawPoland
  2. 2.Faculty of Mathematics and Information ScienceWarsaw University of TechnologyWarsawPoland

Personalised recommendations