Learning Similarities by Accumulating Evidence in a Probabilistic Way

  • Helena Aidos
  • Ana Fred
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8827)


Clustering ensembles take advantage of the diversity produced by multiple clustering algorithms to produce a consensual partition. Evidence accumulation clustering (EAC) combines the output of a clustering ensemble into a co-association similarity matrix, which contains the co-occurrences between pairs of objects in a cluster. A consensus partition is then obtained by applying a clustering technique over this matrix. We propose a new combination matrix, where the co-occurrences between objects are modeled in a probabilistic way. We evaluate the proposed methodology using the dissimilarity increments distribution model. This distribution is based on a high-order dissimilarity measure, which uses triplets of nearest neighbors to identify sparse and odd shaped clusters. Experimental results show that the new proposed algorithm produces better and more robust results than EAC in both synthetic and real datasets.


Clustering ensembles co-association matrix voting scheme probablistic learning of similarities dissimilarity increments distribution 


  1. 1.
    Aidos, H., Duin, R., Fred, A.: The area under the ROC curve as a criterion for clustering evaluation. In: Proc. of the 2nd Int. Conf. on Pattern Recognition Applications and Methods (ICPRAM 2013), Barcelona, pp. 276–280 (2013)Google Scholar
  2. 2.
    Aidos, H., Fred, A.: Statistical modeling of dissimilarity increments for d-dimensional data: Application in partitional clustering. Pattern Recognition 45(9), 3061–3071 (2012)CrossRefzbMATHGoogle Scholar
  3. 3.
    Ayad, H.G., Kamel, M.S.: Cumulative voting consensus method for partitions with variable number of clusters. IEEE Trans. Pattern Anal. Mach. Intell. 30(1), 160–173 (2008)CrossRefGoogle Scholar
  4. 4.
    Dimitriadou, E., Weingessel, A., Hornik, K.: A combination scheme for fuzzy clustering. In: Pal, N.R., Sugeno, M. (eds.) AFSS 2002. LNCS (LNAI), vol. 2275, pp. 332–338. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  5. 5.
    Fern, X.Z., Brodley, C.E.: Solving cluster ensemble problems by bipartite graph partitioning. In: Machine Learning - Proc. of the 21st Int. Conf. (ICML 2004), Banff, Alberta, Canada (2004)Google Scholar
  6. 6.
    Fred, A.: Finding consistent clusters in data partitions. In: Kittler, J., Roli, F. (eds.) MCS 2001. LNCS, vol. 2096, pp. 309–318. Springer, Heidelberg (2001)CrossRefGoogle Scholar
  7. 7.
    Fred, A., Jain, A.: Combining multiple clusterings using evidence accumulation. IEEE Trans. Pattern Anal. Mach. Intell. 27(6), 835–850 (2005)CrossRefGoogle Scholar
  8. 8.
    Fred, A., Jain, A.: Cluster validation using a probabilistic attributed graph. In: 19th Int. Conf. on Pattern Recognition (ICPR 2008), Florida, USA, pp. 1–4 (2008)Google Scholar
  9. 9.
    Fred, A., Leitão, J.: A new cluster isolation criterion based on dissimilarity increments. IEEE Trans. Pattern Anal. Mach. Intell. 25(8), 944–958 (2003)CrossRefGoogle Scholar
  10. 10.
    Kuncheva, L.I., Hadjitodorov, S.T.: Using diversity in cluster ensembles. In: Proc. of the IEEE Int. Conf. on Systems, Man & Cybernetics, The Hague, Netherlands, pp. 1214–1219 (2004)Google Scholar
  11. 11.
    Lourenço, A., Bulò, S.R., Rebagliati, N., Fred, A., Figueiredo, M., Pelillo, M.: Probabilistic evidence accumulation for clustering ensembles. In: Proc. of the 2nd Int. Conf. on Pattern Recognition Applications and Methods (ICPRAM 2013), pp. 58–67. Barcelona (2013)Google Scholar
  12. 12.
    Meila, M., Pentney, W.: Clustering by weighted cuts in directed graphs. In: Proc. of the SIAM Int. Conf. on Data Mining (SDM 2007), pp. 135–144. Minnesota (2007)Google Scholar
  13. 13.
    Strehl, A., Ghosh, J.: Cluster ensembles - a knowledge reuse framework for combining multiple partitions. Journal of Machine Learning Research 3, 583–617 (2002)MathSciNetGoogle Scholar
  14. 14.
    Topchy, A., Jain, A., Punch, W.: Combining multiple weak clusterings. In: Proc. of the 3rd IEEE Int. Conf. on Data Mining (ICDM 2003), Melbourne, Florida, USA, pp. 331–338 (2003)Google Scholar
  15. 15.
    Wang, H., Shan, H., Banerjee, A.: Bayesian cluster ensembles. In: Proc. of the SIAM Int. Conf. on Data Mining (SDM 2009), Nevada, USA, pp. 211–222 (2009) Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Helena Aidos
    • 1
  • Ana Fred
    • 1
  1. 1.Instituto de Telecomunicações, Instituto Superior TécnicoUniversidade de LisboaLisbonPortugal

Personalised recommendations