Machine Learning

, Volume 98, Issue 1–2, pp 331–357 | Cite as

Probabilistic consensus clustering using evidence accumulation

  • André Lourenço
  • Samuel Rota Bulò
  • Nicola Rebagliati
  • Ana L. N. Fred
  • Mário A. T. Figueiredo
  • Marcello Pelillo
Article

Abstract

Clustering ensemble methods produce a consensus partition of a set of data points by combining the results of a collection of base clustering algorithms. In the evidence accumulation clustering (EAC) paradigm, the clustering ensemble is transformed into a pairwise co-association matrix, thus avoiding the label correspondence problem, which is intrinsic to other clustering ensemble schemes. In this paper, we propose a consensus clustering approach based on the EAC paradigm, which is not limited to crisp partitions and fully exploits the nature of the co-association matrix. Our solution determines probabilistic assignments of data points to clusters by minimizing a Bregman divergence between the observed co-association frequencies and the corresponding co-occurrence probabilities expressed as functions of the unknown assignments. We additionally propose an optimization algorithm to find a solution under any double-convex Bregman divergence. Experiments on both synthetic and real benchmark data show the effectiveness of the proposed approach.

Keywords

Consensus clustering Evidence Accumulation Ensemble clustering Bregman divergence 

References

  1. Arora, R., Gupta, M., Kapila, A., & Fazel, M. (2011). Clustering by left-stochastic matrix factorization. In L. Getoor & T. Scheffer (Eds.), ICML (pp. 761–768). Omnipress. Google Scholar
  2. Ayad, H., & Kamel, M. S. (2008). Cumulative voting consensus method for partitions with variable number of clusters. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30(1), 160–173. CrossRefGoogle Scholar
  3. Banerjee, A., Krumpelman, C., Basu, S., Mooney, R. J., & Ghosh, J. (2005a). Model-based overlapping clustering. In Int. conf. on knowledge discovery and data mining (pp. 532–537). Google Scholar
  4. Banerjee, A., Merugu, S., Dhillon, I., & Ghosh, J. (2005b). Clustering with Bregman divergences. Journal of Machine Learning Research, 6, 1705–1749. MathSciNetMATHGoogle Scholar
  5. Bezdek, J. (1981). Pattern recognition with fuzzy objective function algorithms. Norwell: Kluwer Academic. CrossRefMATHGoogle Scholar
  6. Bezdek, J., & Hathaway, R. (2002). VAT: a tool for visual assessment of (cluster) tendency. In Proceedings of the 2002 international joint conference on neural networks 2002, IJCNN’02 (Vol. 3, pp. 2225–2230). Google Scholar
  7. Boyd, S., & Vandenberghe, L. (2004). Convex optimization (1st ed.). Cambridge: Cambridge University Press. CrossRefMATHGoogle Scholar
  8. Cui, Y., Fern, X. Z., & Dy, J. G. (2010). Learning multiple nonredundant clusterings. In Transactions on Knowledge Discovery from Data (TKDD) (Vol. 4, pp. 1–32). Google Scholar
  9. Dhillon, I. S., Mallela, S., & Kumar, R. (2003). A divisive information-theoretic feature clustering algorithm for text classification. Journal of Machine Learning Research, 3, 1265–1287. MathSciNetMATHGoogle Scholar
  10. Dimitriadou, E., Weingessel, A., & Hornik, K. (2002). A combination scheme for fuzzy clustering. In AFSS’02 (pp. 332–338). Google Scholar
  11. Färber, I., Günnemann, S., Kriegel, H., Kröger, P., Müller, E., Schubert, E., Seidl, T., & Zimek, A. (2010). On using class-labels in evaluation of clusterings. In MultiClust: 1st international workshop on discovering, summarizing and using multiple clusterings. Google Scholar
  12. Fern, X. Z., & Brodley, C. E. (2004). Solving cluster ensemble problems by bipartite graph partitioning. In Proc. ICML ’04. Google Scholar
  13. Frank, A., & Asuncion, A. (2012). In UCI machine learning repository. http://archive.ics.uci.edu/ml. Google Scholar
  14. Fred, A. (2001). Finding consistent clusters in data partitions. In J. Kittler & F. Roli (Eds.), Multiple classifier systems (Vol. 2096, pp. 309–318). Berlin: Springer. CrossRefGoogle Scholar
  15. Fred, A., & Jain, A. (2002). Data clustering using evidence accumulation. In Proc. of the 16th int’l conference on pattern recognition (pp. 276–280). Google Scholar
  16. Fred, A., & Jain, A. (2005). Combining multiple clustering using evidence accumulation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(6), 835–850. CrossRefGoogle Scholar
  17. Fred, A., & Jain, A. (2006). Learning pairwise similarity for data clustering. In Proc. of the 18th int’l conference on pattern recognition (ICPR), Hong Kong (Vol. 1, pp. 925–928). CrossRefGoogle Scholar
  18. Ghosh, J., & Acharya, A. (2011). Cluster ensembles Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 1(4), 305–315. Google Scholar
  19. Karypis, G., Aggarwal, R., Kumar, V., & Shekhar, S. (1997). Multilevel hypergraph partitioning: applications in VLSI domain. In Proc. design automation conf. Google Scholar
  20. Griffiths, T. L., & Steyvers, M. (2004). Finding scientific topics. Proceedings of the National Academy of Sciences of the United States of America, 101(Suppl 1), 5228–5235. CrossRefGoogle Scholar
  21. Heller, K., & Ghahramani, Z. (2007). A nonparametric Bayesian approach to modeling overlapping clusters. In Int. conf. AI and statistics. Google Scholar
  22. Jain, A. K., & Dubes, R. (1988). Algorithms for clustering data. New York: Prentice Hall. MATHGoogle Scholar
  23. Jardine, N., & Sibson, R. (1968). The construction of hierarchic and non-hierarchic classifications. Computer Journal, 11, 177–184. CrossRefMATHGoogle Scholar
  24. Kachurovskii, I. R. (1960). On monotone operators and convex functionals. Uspehi Matematičeskih Nauk, 15(4), 213–215. Google Scholar
  25. Karypis, G., & Kumar, V. (1998). Multilevel algorithms for multi-constraint graph partitioning. In Proceedings of the 10th supercomputing conference. Google Scholar
  26. Lee, D. D., & Seung, H. S. (2000). Algorithms for non-negative matrix factorization. In T. K. Leen, T. G. Dietterich, & V. Tresp (Eds.), NIPS (pp. 556–562). Cambridge: MIT Press. Google Scholar
  27. Lourenço, A., Fred, A., & Figueiredo, M. (2011). A generative dyadic aspect model for Evidence Accumulation Clustering. In Proc. 1st int. conf. similarity-based pattern recognition, SIMBAD’11 (pp. 104–116). Berlin/Heidelberg: Springer. CrossRefGoogle Scholar
  28. Lourenço, A., Fred, A., & Jain, A. K. (2010). On the scalability of evidence accumulation clustering. In Proc. 20th international conference on pattern recognition (ICPR), Istanbul, Turkey. Google Scholar
  29. Mei, J. P., & Chen, L. (2010). Fuzzy clustering with weighted medoids for relational data. Pattern Recognition, 43(5), 1964–1974. CrossRefMATHGoogle Scholar
  30. Meila, M. (2003). Comparing clusterings by the variation of information. In Springer (Ed.), Proc. of the sixteenth annual conf. of computational learning theory, COLT. Google Scholar
  31. Nepusz, T., Petróczi, A., Négyessy, L., & Bazsó, F. (2008). Fuzzy communities and the concept of bridgeness in complex networks. Physical Review A, 77, 016107. MathSciNetGoogle Scholar
  32. Ng, A. Y., Jordan, M. I., & Weiss, Y. (2001). On spectral clustering: analysis and an algorithm. In NIPS (pp. 849–856). Cambridge: MIT Press. Google Scholar
  33. Paatero, P., & Tapper, U. (1994). Positive matrix factorization: a non-negative factor model with optimal utilization of error estimates of data values. Environmetrics, 5(2), 111–126. CrossRefGoogle Scholar
  34. Punera, K., & Ghosh, J. (2007). Soft consensus clustering. In Advances in fuzzy clustering and its applications. New York: Wiley. Google Scholar
  35. Punera, K., & Ghosh, J. (2008). Consensus-based ensembles of soft clusterings. Applied Artificial Intelligence, 22(7&8), 780–810. CrossRefGoogle Scholar
  36. Rota Bulò, S., Lourenço, A., Fred, A., & Pelillo, M. (2010). Pairwise probabilistic clustering using evidence accumulation. In Proc. 2010 int. conf. on structural, syntactic, and statistical pattern recognition, SSPR&SPR’10 (pp. 395–404). CrossRefGoogle Scholar
  37. Shi, J., & Malik, J. (2000). Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), 22(8), 888–905. CrossRefGoogle Scholar
  38. Steyvers, M., & Griffiths, T. (2007). Latent semantic analysis: a road to meaning. In Probabilistic topic models. Laurence Erlbaum. Google Scholar
  39. Strehl, A., & Ghosh, J. (2003). Cluster ensembles—a knowledge reuse framework for combining multiple partitions. Journal of Machine Learning Research, 3, 583–617. MathSciNetMATHGoogle Scholar
  40. Topchy, A., Jain, A., & Punch, W. (2003). Combining multiple weak clusterings. In IEEE intl. conf on data mining, Melbourne (pp. 331–338). CrossRefGoogle Scholar
  41. Topchy, A., Jain, A., & Punch, W. (2004). A mixture model of clustering ensembles. In Proc. of the SIAM conf. on data mining. Google Scholar
  42. Topchy, A., Jain, A. K., & Punch, W. (2005). Clustering ensembles: models of consensus and weak partitions. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(12), 1866–1881. CrossRefGoogle Scholar
  43. Wang, H., Shan, H., & Banerjee, A. (2009). Bayesian cluster ensembles. In 9th SIAM int. conf. on data mining. Google Scholar
  44. Wang, H., Shan, H., & Banerjee, A. (2011). Bayesian cluster ensembles. Statistical Analysis and Data Mining, 4(1), 54–70. MathSciNetCrossRefGoogle Scholar
  45. Wang, P., Domeniconi, C., & Laskey, K. B. (2010). Nonparametric Bayesian clustering ensembles. In ECML PKDD’10 (pp. 435–450). Google Scholar

Copyright information

© The Author(s) 2013

Authors and Affiliations

  • André Lourenço
    • 1
    • 2
  • Samuel Rota Bulò
    • 3
  • Nicola Rebagliati
    • 4
  • Ana L. N. Fred
    • 2
    • 5
  • Mário A. T. Figueiredo
    • 2
    • 5
  • Marcello Pelillo
    • 3
  1. 1.Instituto Superior de Engenharia de LisboaLisboaPortugal
  2. 2.Instituto de TelecomunicaçõesLisboaPortugal
  3. 3.DAISMestre, VeneziaItaly
  4. 4.VTT Technical Research Center of FinlandVTTFinland
  5. 5.Instituto Superior TécnicoLisboaPortugal

Personalised recommendations