Correlation Clustering Revisited: The “True” Cost of Error Minimization Problems

  • Nir Ailon
  • Edo Liberty
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5555)

Abstract

Correlation Clustering was defined by Bansal, Blum, and Chawla as the problem of clustering a set of elements based on a, possibly inconsistent, binary similarity function between element pairs. Their setting is agnostic in the sense that a ground truth clustering is not assumed to exist, and the cost of a solution is computed against the input similarity function. This problem has been studied in theory and in practice and has been subsequently proven to be APX-Hard.

In this work we assume that there does exist an unknown correct clustering of the data. In this setting, we argue that it is more reasonable to measure the output clustering’s accuracy against the unknown underlying true clustering. We present two main results. The first is a novel method for continuously morphing a general (non-metric) function into a pseudometric. This technique may be useful for other metric embedding and clustering problems. The second is a simple algorithm for randomly rounding a pseudometric into a clustering. Combining the two, we obtain a certificate for the possibility of getting a solution of factor strictly less than 2 for our problem. This approximation coefficient could not have been achieved by considering the agnostic version of the problem unless P = NP.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Bansal, N., Blum, A., Chawla, S.: Correlation clustering. Machine Learning Journal (Special Issue on Theoretical Advances in Data Clustering) 56(1–3), 89–113 (2004); Extended abstract appeared in FOCS 2002, pp. 238–247Google Scholar
  2. 2.
    Ailon, N., Charikar, M., Newman, A.: Aggregating inconsistent information: ranking and clustering. In: Proceedings of the 37th Annual ACM Symposium on Theory of Computing (STOC), pp. 684–693 (2005)Google Scholar
  3. 3.
    Bonizzoni, P., Della Vedova, G., Dondi, R., Jiang, T.: On the approximation of correlation clustering and consensus clustering. Journal of Computer and System Sciences 74(5), 671–696 (2008)MathSciNetCrossRefMATHGoogle Scholar
  4. 4.
    Emanuel, D., Fiat, A.: Correlation clustering – minimizing disagreements on arbitrary weighted graphs. In: Di Battista, G., Zwick, U. (eds.) ESA 2003. LNCS, vol. 2832, pp. 208–220. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  5. 5.
    Charikar, M., Guruswami, V., Wirth, A.: Clustering with qualitative information. In: Proceedings of the 44th Annual IEEE Symposium on Foundations of Computer Science (FOCS), Boston, pp. 524–533 (2003)Google Scholar
  6. 6.
    Giotis, I., Guruswami, V.: Correlation clustering with a fixed number of clusters. In: SODA 2006: Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm, pp. 1167–1176. ACM Press, New York (2006)CrossRefGoogle Scholar
  7. 7.
    Ailon, N., Charikar, M.: Fitting tree metrics: Hierarchical clustering and phylogeny. In: Proceedings of the 46th Annual IEEE Symposium on Foundations of Computer Science, FOCS (2005)Google Scholar
  8. 8.
    Gionis, A., Mannila, H., Tsaparas, P.: Clustering aggregation. In: Proceedings of the 21st International Conference on Data Engineering (ICDE), Tokyo (to appear, 2005)Google Scholar
  9. 9.
    Filkov, V., Skiena, S.: Integrating microarray data by consensus clustering. In: Proceedings of International Conference on Tools with Artificial Intelligence (ICTAI), Sacramento, pp. 418–425 (2003)Google Scholar
  10. 10.
    Strehl, A.: Relationship-based clustering and cluster ensembles for high-dimensional data mining. PhD Dissertation, University of Texas at Austin (May 2002)Google Scholar
  11. 11.
    McSherry, F.: Spectral partitioning of random graphs. In: FOCS 2001: Proceedings of the 42nd IEEE symposium on Foundations of Computer Science, Washington, p. 529 (2001)Google Scholar
  12. 12.
    Aslam, J., Leblanc, A., Stein, C.: A new approach to clustering. In: 4th International Workshop on Algorithm Engineering (2000)Google Scholar
  13. 13.
    Ailon, N., Mohri, M.: Efficient reduction of ranking to classification. In: The 21st Annual Conference on Learning Theory (COLT), Helsinki, Finland (to appear, 2008)Google Scholar
  14. 14.
    Balcan, M.-F., Blum, A., Gupta, A.: Approximate clustering without the approximation. In: SODA 2009, New York (2009)Google Scholar
  15. 15.
    Ailon, N., Liberty, E.: Correlation clustering revisited: The “true” cost of error minimization problems. Yale University Tecnical Report 1214 (2008)Google Scholar
  16. 16.
    Ailon, N.: Aggregation of partial rankings, p-ratings and top-m lists. In: SODA (2007)Google Scholar
  17. 17.
    Ailon, N., Liberty, E.: Mathematica program (2008), http://www.cs.yale.edu/homes/el327/public/prove32/

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Nir Ailon
    • 1
  • Edo Liberty
    • 2
  1. 1.Google ResearchNew YorkUSA
  2. 2.Yale UniversityNew HavenUSA

Personalised recommendations