Encyclopedia of Machine Learning

2010 Edition
| Editors: Claude Sammut, Geoffrey I. Webb

Correlation Clustering

  • Anthony Wirth
Reference work entry
DOI: https://doi.org/10.1007/978-0-387-30164-8_176



In its rawest form, correlation clustering is graph optimization problem. Consider a  clustering C to be a mapping from the elements to be clustered, V , to the set {1, , | V | }, so that u and v are in the same cluster if and only if C[ u] = C[ v]. Given a collection of items in which each pair ( u, v) has two weights w uv + and w uv , we must find a clustering C that minimizes
$$\sum \limits_{C[u]=C[v]}{w}_{uv}^{-} + \sum \limits_{C[u]\neq C[v]}{w}_{uv}^{+}\,,$$
This is a preview of subscription content, log in to check access.

Recommended Reading

  1. Ailon, N., Charikar, M., & Newman, A. (2005). Aggregating inconsistent information: Ranking and clustering. In Proceedings of the Thirty-Seventh ACM Symposium on the Theory of Computing (pp. 684–693). New York: ACM Press.Google Scholar
  2. Alon, N., Makarychev, K., Makarychev, Y., & Naor, A. (2006). Quadratic forms on graphs. Inventiones Mathematicae, 163(3), 499–522.MathSciNetzbMATHGoogle Scholar
  3. Arora, S., Berger, E., Hazan, E., Kindler, G., & Safra, S. (2005). On non-approximability for quadratic programs. In Proceedings of Forty-Sixth Symposium on Foundations of Computer Science. (pp. 206–215). Washington DC: IEEE Computer Society.Google Scholar
  4. Bansal, N., Blum, A., & Chawla, S. (2002). Correlation clustering. In Correlation clustering (pp. 238–247). Washington, DC: IEEE Computer Society.Google Scholar
  5. Ben-Dor, A., Shamir, R., & Yakhini, Z. (1999). Clustering gene expression patterns. Journal of Computational Biology, 6, 281–297.Google Scholar
  6. Bertolacci, M., & Wirth, A. (2007). Are approximation algorithms for consensus clustering worthwhile? In Proceedings of Seventh SIAM International Conference on Data Mining. (pp. 437–442). Philadelphia: SIAM.Google Scholar
  7. Charikar, M., Guruswami, V., & Wirth, A. (2003). Clustering with qualitative information. In Proceedings of forty fourth FOCS (pp. 524–533).Google Scholar
  8. Charikar, M., & Wirth, A. (2004). Maximizing quadratic programs: Extending Grothendieck’s inequality. In Proceedings of forty fifth FOCS (pp. 54–60).Google Scholar
  9. Daume, H. (2006). Practical structured learning techniques for natural language processing. PhD thesis, University of Southern California.Google Scholar
  10. Davidson, I., & Ravi, S. (2005). Clustering with constraints: Feasibility issues and the k-means algorithm. In Proceedings of Fifth SIAM International Conference on Data Mining.Google Scholar
  11. Demaine, E., Emanuel, D., Fiat, A., & Immorlica, N. (2006). Correlation clustering in general weighted graphs. Theoretical Computer Science, 361(2), 172–187.MathSciNetzbMATHGoogle Scholar
  12. Demaine, E., & Immorlica, N. (2003). Correlation clustering with partial information. In Proceedings of Sixth Workshop on Approximation Algorithms for Combinatorial Optimization Problems. (pp. 1–13).Google Scholar
  13. Emanuel, D., & Fiat, A. (2003). Correlation clustering – minimizing disagreements on arbitrary weighted graphs. In Proceedings of Eleventh European Symposium on Algorithms (pp. 208–220).Google Scholar
  14. Ferligoj, A., & Batagelj, V. (1982). Clustering with relational constraint. Psychometrika, 47(4), 413–426.MathSciNetzbMATHGoogle Scholar
  15. Finley, T., & Joachims, T. (2005). Supervised clustering with support vector machines. In Proceedings of Twenty-Second International Conference on Machine Learning.Google Scholar
  16. Gionis, A., Mannila, H., & Tsaparas, P. (2005). Clustering aggregation. In Proceedings of Twenty-First International Conference on Data Engineering. To appear.Google Scholar
  17. Gramm, J., Guo, J., Hüffner, F., & Niedermeier, R. (2004). Automated generation of search tree algorithms for hard graph modification problems. Algorithmica, 39(4), 321–347.MathSciNetzbMATHGoogle Scholar
  18. Kulis, B., Basu, S., Dhillon, I., & Mooney, R. (2005). Semi-supervised graph clustering: A kernel approach. In Proceedings of Twenty-Second International Conference on Machine Learning (pp. 457–464).Google Scholar
  19. McCallum, A., & Wellner, B. (2005). Conditional models of identity uncertainty with application to noun coreference. In L. Saul, Y. Weiss, & L. Bottou, (Eds.), Advances in neural information processing systems 17 (pp. 905–912). Cambridge, MA: MIT Press.Google Scholar
  20. Meilă, M. (2003). Comparing clusterings by the variation of information. In Proceedings of Sixteenth Conference on Learning Theory (pp. 173–187).Google Scholar
  21. Shamir, R., Sharan, R., & Tsur, D. (2004). Cluster graph modification problems. Discrete Applied Mathematics, 144, 173–182.MathSciNetzbMATHGoogle Scholar
  22. Swamy, C. (2004). Correlation Clustering: Maximizing agreements via semidefinite programming. In Proceedings of Fifteenth ACM-SIAM Symposium on Discrete Algorithms (pp. 519–520).Google Scholar
  23. Tan, J. (2007). A Note on the inapproximability of correlation clustering. Technical Report 0704.2092, eprint arXiv, 2007.Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2011

Authors and Affiliations

  • Anthony Wirth

There are no affiliations available