Hierarchical Clustering via Penalty-Based Aggregation and the Genie Approach

  • Marek GagolewskiEmail author
  • Anna Cena
  • Maciej Bartoszuk
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9880)


The paper discusses a generalization of the nearest centroid hierarchical clustering algorithm. A first extension deals with the incorporation of generic distance-based penalty minimizers instead of the classical aggregation by means of centroids. Due to that the presented algorithm can be applied in spaces equipped with an arbitrary dissimilarity measure (images, DNA sequences, etc.). Secondly, a correction preventing the formation of clusters of too highly unbalanced sizes is applied: just like in the recently introduced Genie approach, which extends the single linkage scheme, the new method averts a chosen inequity measure (e.g., the Gini-, de Vergottini-, or Bonferroni-index) of cluster sizes from raising above a predefined threshold. Numerous benchmarks indicate that the introduction of such a correction increases the quality of the resulting clusterings significantly.


Hierarchical clustering Aggregation Centroid Gini-index Genie algorithm 



This study was supported by the National Science Center, Poland, research project 2014/13/D/HS4/01700.


  1. 1.
    Anderberg, M.R.: Cluster Analysis for Applications. Academic Press, New York (1973)zbMATHGoogle Scholar
  2. 2.
    Aristondo, O., García-Lapresta, J., de la Vega, C.L., Pereira, R.M.: Classical inequality indices, welfare and illfare functions, and the dual decomposition. Fuzzy Sets Syst. 228, 114–136 (2013)MathSciNetCrossRefzbMATHGoogle Scholar
  3. 3.
    Beliakov, G., Bustince, H., Calvo, T.: A Practical Guide to Averaging Functions. Springer, Heidelberg (2016)CrossRefGoogle Scholar
  4. 4.
    Bortot, S., Marques Pereira, R.: On a new poverty measure constructed from the exponential mean. In: Proceedings of IFSA/EUSFLAT’15, pp. 333–340. Atlantis Press (2015)Google Scholar
  5. 5.
    Cena, A., Gagolewski, M.: Fuzzy K-minpen clustering and K-nearest-minpen classification procedures incorporating generic distance-based penalty minimizers. In: Carvalho, J.P., Lesot, M.-J., Kaymak, U., Vieira, S., Bouchon-Meunier, B., Yager, R.R. (eds.) IPMU 2016. CCIS, vol. 611, pp. 445–456. Springer, Heidelberg (2016). doi: 10.1007/978-3-319-40581-0_36 CrossRefGoogle Scholar
  6. 6.
    Deza, M.M., Deza, E.: Encyclopedia of Distances. Springer, Heidelberg (2013)CrossRefzbMATHGoogle Scholar
  7. 7.
    Gagolewski, M.: Data Fusion: Theory, Methods, and Applications. Institute of Computer Science, Polish Academy of Sciences, Warsaw, Poland (2015)Google Scholar
  8. 8.
    Gagolewski, M., Bartoszuk, M., Cena, A.: Genie: a new, fast, and outlier-resistant hierarchical clustering algorithm. Inf. Sci. 363, 8–23 (2016)CrossRefGoogle Scholar
  9. 9.
    García-Lapresta, J., Lasso de la Vega, C., Marques Pereira, R., Urrutia, A.: A new class of fuzzy poverty measures. In: Proceedings of IFSA/EUSFLAT 2015, pp. 1140–1146. Atlantis Press (2015)Google Scholar
  10. 10.
    Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer, Heidelberg (2013)zbMATHGoogle Scholar
  11. 11.
    Lance, G.N., Williams, W.T.: A general theory of classificatory sorting strategies. Comput. J. 9(4), 373–380 (1967)CrossRefGoogle Scholar
  12. 12.
    Legendre, P., Legendre, L.: Numerical Ecology. Elsevier Science BV, Amsterdam (2003)zbMATHGoogle Scholar
  13. 13.
    Müllner, D.: Modern hierarchical, agglomerative clustering algorithms. arXiv:1109.2378 [stat.ML] (2011)
  14. 14.
    Olson, C.F.: Parallel algorithms for hierarchical clustering. Parallel Comput. 21, 1313–1325 (1995)MathSciNetCrossRefzbMATHGoogle Scholar
  15. 15.
    R Development Core Team: \({\sf {R}}\): A Language and Environment for Statistical Computing. \({\sf {R}}\) Foundation for Statistical Computing, Vienna (2016).

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  • Marek Gagolewski
    • 1
    • 2
    Email author
  • Anna Cena
    • 1
  • Maciej Bartoszuk
    • 2
  1. 1.Systems Research InstitutePolish Academy of SciencesWarsawPoland
  2. 2.Faculty of Mathematics and Information ScienceWarsaw University of TechnologyWarsawPoland

Personalised recommendations