Knowledge and Information Systems

, Volume 47, Issue 2, pp 329–354 | Cite as

On strategies for building effective ensembles of relative clustering validity criteria

  • Pablo A. JaskowiakEmail author
  • Davoud Moulavi
  • Antonio C. S. Furtado
  • Ricardo J. G. B. Campello
  • Arthur Zimek
  • Jörg Sander
Regular Paper


Evaluation and validation are essential tasks for achieving meaningful clustering results. Relative validity criteria are measures usually employed in practice to select and validate clustering solutions, as they enable the evaluation of single partitions and the comparison of partition pairs in relative terms based only on the data under analysis. There is a plethora of relative validity measures described in the clustering literature, thus making it difficult to choose an appropriate measure for a given application. One reason for such a variety is that no single measure can capture all different aspects of the clustering problem and, as such, each of them is prone to fail in particular application scenarios. In the present work, we take advantage of the diversity in relative validity measures from the clustering literature. Previous work showed that when randomly selecting different relative validity criteria for an ensemble (from an initial set of 28 different measures), one can expect with great certainty to only improve results over the worst criterion included in the ensemble. In this paper, we propose a method for selecting measures with minimum effectiveness and some degree of complementarity (from the same set of 28 measures) into ensembles, which show superior performance when compared to any single ensemble member (and not just the worst one) over a variety of different datasets. One can also expect greater stability in terms of evaluation over different datasets, even when considering different ensemble strategies. Our results are based on more than a thousand datasets, synthetic and real, from different sources.


Clustering Clustering validation Relative validity criteria  Relative validity indices Ensemble Combination Aggregation 



This project was partially funded by Canadian Research Agency NSERC and by Brazilian Research Agencies CNPq and FAPESP. Pablo A. Jaskowiak thanks FAPESP (Grants #2012/15751-9 and #2011/04247-5). Ricardo J. G. B. Campello thanks CNPq (Grant #304137/2013-8) and FAPESP (Grants #2010/20032-6 and #2013/ 18698-4).


  1. 1.
    Albalate A, Suendermann D (2009) A combination approach to cluster validation based on statistical quantiles. In: International joint conference on bioinformatics, systems biology and intelligent computing—IJCBS, pp 549–555Google Scholar
  2. 2.
    Baya AE, Granitto PM (2013) How many clusters: a validation index for arbitrary-shaped clusters. IEEE/ACM Trans Comput Biol Bioinf 10(2):401–414CrossRefGoogle Scholar
  3. 3.
    Bezdek JC, Pal NR (1998) Some new indexes of cluster validity. IEEE Trans Syst Man Cybern B 28(3):301–315CrossRefGoogle Scholar
  4. 4.
    Bolshakova N, Azuaje F (2003) Cluster validation techniques for genome expression data. Sig Process 83(4):825–833CrossRefzbMATHGoogle Scholar
  5. 5.
    Calinski RB, Harabasz J (1974) A dendrite method for cluster analysis. Commun Stat 3:1–27MathSciNetCrossRefzbMATHGoogle Scholar
  6. 6.
    Cormack GV, Clarke CLA, Buettcher S (2009) Reciprocal rank fusion outperforms Condorcet and individual rank learning methods. In: Proceedings of the 32nd international ACM SIGIR conference on research and development in information retrieval, SIGIR ’09, pp 758–759Google Scholar
  7. 7.
    Davies DL, Bouldin DW (1979) A cluster separation measure. IEEE Trans Pattern Anal Mach Intell 1:224–227CrossRefGoogle Scholar
  8. 8.
    de Borda JC (1781) Mémoire sur les élections au scrutin. Histoire de l’Academie Royale des Sciences, pp 657–665Google Scholar
  9. 9.
    Dudoit S, Fridlyand J (2002) A prediction-based resampling method for estimating the number of clusters in a dataset. Genome Biol 3(7):0036.1–0036.21CrossRefGoogle Scholar
  10. 10.
    Dunn JC (1974) Well separated clusters and optimal fuzzy partitions. J Cybern 4:95–104MathSciNetCrossRefzbMATHGoogle Scholar
  11. 11.
    Dwork C, Kumar R, Naor M, Sivakumar D (2001) Rank aggregation methods for the web. In: Proceedings of the 10th international conference on World Wide Web, pp 613–622Google Scholar
  12. 12.
    Estivill-Castro V (2002) Why so many clustering algorithms: a position paper. ACM SIGKDD Explor 4(1):65–75MathSciNetCrossRefGoogle Scholar
  13. 13.
    Friedman M (1937) The use of ranks to avoid the assumption of normality implicit in the analysis of variance. J Am Stat Assoc 32(200):675–701CrossRefzbMATHGoogle Scholar
  14. 14.
    Färber I, Günnemann S, Kriegel HP, Kröger P, Müller E, Schubert E, Seidl T, Zimek A (2010) On using class-labels in evaluation of clusterings. In: MultiClust: 1st international workshop on discovering, summarizing and using multiple clusterings held in conjunction with KDD 2010, Washington, DCGoogle Scholar
  15. 15.
    Gan G, Ma C, Wu J (2007) Data clustering: theory, algorithms, and applications. ASA-SIAMGoogle Scholar
  16. 16.
    Geusebroek JM, Burghouts GJ, Smeulders AWM (2005) The Amsterdam library of object images. Int J Comput Vision 61(1):103–112CrossRefGoogle Scholar
  17. 17.
    Ghosh J, Acharya A (2011) Cluster ensembles. Wiley Interdiscip Rev Data Mining Knowl Discov 1(4):305–315CrossRefGoogle Scholar
  18. 18.
    Halkidi M, Batistakis Y, Vazirgiannis M (2001) On clustering validation techniques. J Intell Inf Syst 17:107–145CrossRefzbMATHGoogle Scholar
  19. 19.
    Hartigan JA (1975) Clustering algorithms. Wiley, New YorkzbMATHGoogle Scholar
  20. 20.
    Hill RS (1980) A stopping rule for partitioning dendrograms. Bot Gaz 141:321–324CrossRefGoogle Scholar
  21. 21.
    Horta D, Campello RJGB (2012) Automatic aspect discrimination in data clustering. Pattern Recogn 45(12):4370–4388CrossRefzbMATHGoogle Scholar
  22. 22.
    Hruschka ER, Campello RJGB, Castro LN (2004) Improving the efficiency of a clustering genetic algorithm. In: Ibero-American conference on artificial intelligence—IBERAMIA, vol 3315, pp 861–870Google Scholar
  23. 23.
    Hruschka ER, Campello RJGB, Castro LN (2006) Evolving clusters in gene-expression data. Inf Sci 176:1898–1927MathSciNetCrossRefGoogle Scholar
  24. 24.
    Hubert L, Arabie P (1985) Comparing partitions. J Classif 2(1):193–218CrossRefzbMATHGoogle Scholar
  25. 25.
    Hubert LJ, Levin JR (1976) A general statistical framework for assessing categorical clustering in free recall. Psychol Bull 10:1072–1080CrossRefGoogle Scholar
  26. 26.
    Jaccard P (1901) Distribution de la florine alpine dans la bassin de dranses et dans quelques regiones voisines. Bull Soc Vaudoise Sci Nat 37:241–272Google Scholar
  27. 27.
    Jain AK (2010) Data clustering: 50 years beyond k-means. Pattern Recogn Lett 31:651–666CrossRefGoogle Scholar
  28. 28.
    Jain AK, Dubes RC (1988) Algorithms for clustering data. Prentice Hall, Englewood CliffszbMATHGoogle Scholar
  29. 29.
    Jain AK, Murty MN, Flynn PJ (1999) Data clustering: a review. ACM Comput Surv 31:264–323CrossRefGoogle Scholar
  30. 30.
    Kaufman L, Rousseeuw P (1990) Finding groups in data. Wiley, New YorkCrossRefGoogle Scholar
  31. 31.
    Klementiev A, Roth D, Small K (2007) An unsupervised learning algorithm for rank aggregation. In: Proceedings of the 18th European conference on machine learning (ECML), Warsaw, Poland, pp 616–623Google Scholar
  32. 32.
    Kolde R, Laur S, Adler P, Vilo J (2012) Robust rank aggregation for gene list integration and meta-analysis. Bioinformatics 28(4):573–580CrossRefGoogle Scholar
  33. 33.
    Kriegel HP, Kröger P, Sander J, Zimek A (2011a) Density-based clustering. Wiley Interdiscip Rev Data Mining Knowl Discov 1(3):231–240CrossRefGoogle Scholar
  34. 34.
    Kriegel HP, Kröger P, Schubert E, Zimek A (2011b) Interpreting and unifying outlier scores. In: Proceedings of the 11th SIAM international conference on data mining (SDM), Mesa, AZ, pp 13–24Google Scholar
  35. 35.
    Kuncheva L, Whitaker C (2003) Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Mach Learn 51(2):181–207CrossRefzbMATHGoogle Scholar
  36. 36.
    Lazarevic A, Kumar V (2005) Feature bagging for outlier detection. In: Proceedings of the 11th ACM International conference on knowledge discovery and data mining (SIGKDD), Chicago, IL, pp 157–166Google Scholar
  37. 37.
    Machado JB, Campello RJGB, Amaral WC (2007) Design of OBF-TS fuzzy models based on multiple clustering validity criteria. In: International conference on tools with artificial intelligence—ICTAI, pp 336–339Google Scholar
  38. 38.
    Marquis de Condorcet MJANC (1785) Essai sur l’application de l’analyse à la probabilité des décisions rendues à la pluralité des voix. L’Imprimerie Royale, ParisGoogle Scholar
  39. 39.
    Maulik U, Bandyopadhyay S (2002) Performance evaluation of some clustering algorithms and validity indices. IEEE Trans Pattern Anal Mach Intell 24(12):1650–1654CrossRefGoogle Scholar
  40. 40.
    McQueen JB (1967) Some methods of classification and analysis of multivariate observations. 5th Berkeley symposium on mathematical statistics and probability, pp 281–297Google Scholar
  41. 41.
    Milligan GW (1981) A monte carlo study of thirty internal criterion measures for cluster analysis. Psychometrika 46(2):187–199MathSciNetCrossRefzbMATHGoogle Scholar
  42. 42.
    Milligan GW, Cooper MC (1985) An examination of procedures for determining the number of clusters in a data set. Psychometrika 50(2):159–179CrossRefGoogle Scholar
  43. 43.
    Moulavi D, Jaskowiak PA, Campello RJGB, Zimek A, Sander J (2014) Density-based clustering validation. In: Proceedings of the 14th SIAM International conference on data mining (SDM), Philadelphia, PA, pp 839–847Google Scholar
  44. 44.
    Naldi M, Carvalho ACPLF, Campello RJGB (2013) Cluster ensemble selection based on relative validity indexes. Data Min Knowl Disc 27(2):259–289MathSciNetCrossRefzbMATHGoogle Scholar
  45. 45.
    Nemenyi PB (1963) Distribution-free multiple comparisons. PhD thesis, Princeton UniversityGoogle Scholar
  46. 46.
    Pakhira MK, Bandyopadhyay S, Maulik U (2004) Validity index for crisp and fuzzy clusters. Pattern Recogn 37:487–501CrossRefzbMATHGoogle Scholar
  47. 47.
    Pihur V, Datta S, Datta S (2007) Weighted rank aggregation of cluster validation measures: a Monte Carlo cross-entropy approach. Bioinformatics 23(13):1607–1615CrossRefGoogle Scholar
  48. 48.
    Pihur V, Datta S, Datta S (2009) Rankaggreg, an R package for weighted rank aggregation. BMC Bioinf 10(1):62CrossRefGoogle Scholar
  49. 49.
    Polikar R (2012) Ensemble learning. In: Ma Y, Zhang C (eds) Ensemble machine learning. Springer, Berlin, pp 1–34CrossRefGoogle Scholar
  50. 50.
    Rabbany R, Takaffoli M, Fagnan J, Zaiane OR, Campello RJGB (2012) Relative validity criteria for community mining algorithms. IEEE/ACM international conference on advances in social networks analysis and mining—ASONAM, pp 258–265Google Scholar
  51. 51.
    Ratkowsky DA, Lance GN (1978) A criterion for determining the number of groups in a classification. Aust Comput J 10:115–117Google Scholar
  52. 52.
    Rokach L (2010) Ensemble-based classifiers. Artif Intell Rev 33:1–39CrossRefGoogle Scholar
  53. 53.
    Rousseeuw PJ (1987) Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J Comput Appl Math 20:53–65CrossRefzbMATHGoogle Scholar
  54. 54.
    Schalekamp F, van Zuylen A (2009) Rank aggregation: together we’re strong. In: Proceedings of the workshop on algorithm engineering and experiments (ALENEX) SIAM, New York, NY, pp 38–51Google Scholar
  55. 55.
    Schubert E, Wojdanowski R, Zimek A, Kriegel HP (2012) On evaluation of outlier rankings and outlier scores. In: Proceedings of the 12th SIAM international conference on data mining (SDM), Anaheim, CA, pp 1047–1058Google Scholar
  56. 56.
    Sheng W, Swift S, Zhang L, Liu X (2005) A weighted sum validity function for clustering with a hybrid niching genetic algorithm. IEEE Trans Syst Man Cybern B 35(6):1156–1167CrossRefGoogle Scholar
  57. 57.
    Spearman C (1904) The proof and measurement of association between two things. Am J Psychol 100(3/4):441–471CrossRefGoogle Scholar
  58. 58.
    Vendramin L, Campello RJGB, Hruschka ER (2009) On the comparison of relative clustering validity criteria. In: Proceedings of the 9th SIAM international conference on data mining (SDM). Sparks, NV, pp 733–744Google Scholar
  59. 59.
    Vendramin L, Campello RJGB, Hruschka ER (2010) Relative clustering validity criteria: a comparative overview. Stat Anal Data Mining 3(4):209–335MathSciNetGoogle Scholar
  60. 60.
    Vendramin L, Jaskowiak PA, Campello RJGB (2013) On the combination of relative clustering validity criteria. In: Proceedings of the 25th international conference on scientific and statistical database management (SSDBM), Baltimore, MD, pp 4:1–4:12Google Scholar
  61. 61.
    Xu R, Wunsch DC II (2005) Survey of clustering algorithms. IEEE Trans Neural Netw 16:645–678CrossRefGoogle Scholar
  62. 62.
    Yeung KY, Fraley C, Murua A, Raftery AE, Ruzzo WL (2001) Model-based clustering and data transformations for gene expression data. Bioinformatics 17(10):977–987CrossRefGoogle Scholar
  63. 63.
    Zimek A, Campello RJGB, Sander J (2013) Ensembles for unsupervised outlier detection: challenges and research questions. ACM SIGKDD Explor 15(1):11–22CrossRefGoogle Scholar
  64. 64.
    Zimek A, Campello RJGB, Sander J (2014) Data perturbation for outlier detection ensembles. In: Proceedings of the 26th international conference on scientific and statistical database management (SSDBM), Aalborg, Denmark, pp 13:1–13:12Google Scholar

Copyright information

© Springer-Verlag London 2015

Authors and Affiliations

  • Pablo A. Jaskowiak
    • 1
    Email author
  • Davoud Moulavi
    • 2
  • Antonio C. S. Furtado
    • 2
  • Ricardo J. G. B. Campello
    • 1
  • Arthur Zimek
    • 3
  • Jörg Sander
    • 2
  1. 1.Department of Computer ScienceUniversity of São PauloSão CarlosBrazil
  2. 2.Department of Computing ScienceUniversity of AlbertaEdmontonCanada
  3. 3.Ludwig-Maximilians-Universität MünchenMunichGermany

Personalised recommendations