Correlation Clustering and Consensus Clustering

  • Paola Bonizzoni
  • Gianluca Della Vedova
  • Riccardo Dondi
  • Tao Jiang
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3827)

Abstract

The Correlation Clustering problem has been introduced recently [5] as a model for clustering data when a binary relationship between data points is known. More precisely, for each pair of points we have two scores measuring respectively the similarity and dissimilarity of the two points, and we would like to compute an optimal partition where the value of a partition is obtained by summing up scores of pairs involving points from a same cluster and scores of pairs involving points from different clusters. A closely related problem is Consensus Clustering, where we are given a set of partitions and we would like to obtain a partition that best summarizes the input partitions. The latter problem is a restricted case of Correlation Clustering. In this paper we prove that Min Consensus Clustering is APX-hard even for three input partitions, answering an open question, while Max Consensus Clustering admits a PTAS on instances with a bounded number of input partitions. We exhibit a combinatorial and practical \({4}\over{5}\)-approximation algorithm based on a greedy technique for Max Consensus Clustering on three partitions. Moreover, we prove that a PTAS exists for Max Correlation Clustering when the maximum ratio between two scores is at most a constant.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Alimonti, P., Kann, V.: Some APX-completeness results for cubic graphs. Theoretical Computer Science 237(1–2), 123–134 (2000)MATHCrossRefMathSciNetGoogle Scholar
  2. 2.
    Arora, S., Karger, D., Karpinski, M.: Polynomial time approximation schemes for dense instances of \(\mathcal{NP}\)-hard problems. Journal of Computer and System Sciences 58, 193–210 (2000)CrossRefMathSciNetGoogle Scholar
  3. 3.
    Ailon, N., Charikar, M., Newman, A.: Aggregating Inconsistent Information: Ranking and Clustering. In: Proc. 37th Symposium on Theory of Computing (STOC 2005), pp. 684–693 (2005)Google Scholar
  4. 4.
    Ausiello, G., Crescenzi, P., Gambosi, V., Kann, G., Marchetti-Spaccamela, A., Protasi, M.: Complexity and Approximation: Combinatorial optimization problems and their approximability properties. Springer, Heidelberg (1999)MATHGoogle Scholar
  5. 5.
    Bansal, N., Blum, A., Chawla, S.: Correlation clustering. Machine Learning 56(1-3), 89–113 (2004)MATHCrossRefGoogle Scholar
  6. 6.
    Charikar, M., Guruswami, V., Wirth, A.: Clustering with qualitative information. In: Proc. 44th Symp. Foundations of Computer Science (FOCS), pp. 524–533 (2003)Google Scholar
  7. 7.
    Demaine, E.D., Immorlica, N.: Correlation clustering with partial information. In: Arora, S., Jansen, K., Rolim, J.D.P., Sahai, A. (eds.) RANDOM 2003 and APPROX 2003. LNCS, vol. 2764, pp. 1–13. Springer, Heidelberg (2003)Google Scholar
  8. 8.
    Emanuel, D., Fiat, A.: Correlation clustering – minimizing disagreements on arbitrary weighted graphs. In: Di Battista, G., Zwick, U. (eds.) ESA 2003. LNCS, vol. 2832, pp. 208–220. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  9. 9.
    Filkov, V., Skiena, S.: Integrating microarray data by consensus clustering. In: Proc. 15th International Conference on Tools with Artificial Intelligence (ICTAI), pp. 418–425 (2003)Google Scholar
  10. 10.
    Filkov, V., Skiena, S.: Heterogeneous data integration with the consensus clustering formalism. In: Rahm, E. (ed.) DILS 2004. LNCS (LNBI), vol. 2994, pp. 110–123. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  11. 11.
    Grötschel, M., Wakabayashi, Y.: A cutting plane algorithm for a clustering problem. Mathematical Programming 45, 52–96 (1989)CrossRefGoogle Scholar
  12. 12.
    Krivanek, M., Moravek, J.: Hard problems in hierarchical-tree clustering. Acta Informatica 23, 311–323 (1986)MATHCrossRefMathSciNetGoogle Scholar
  13. 13.
    Swamy, C.: Correlation clustering: maximizing agreements via semidefinite programming. In: Proc. 15th Symp. on Discrete Algorithms (SODA), pp. 526–527 (2004)Google Scholar
  14. 14.
    Wakabayashi, Y.: The complexity of computing medians of relations. Resenhas 3(3), 323–349 (1998)MATHMathSciNetGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • Paola Bonizzoni
    • 1
  • Gianluca Della Vedova
    • 2
  • Riccardo Dondi
    • 1
  • Tao Jiang
    • 3
    • 4
  1. 1.Dipartimento di Informatica, Sistemistica e ComunicazioneUniversità degli Studi di Milano-Bicocca MilanoItaly
  2. 2.Dipartimento di StatisticaUniversità degli Studi di Milano-Bicocca MilanoItaly
  3. 3.Department of Computer Science and EngineeringUniversity of CaliforniaRiversideUSA
  4. 4.Center for Advanced StudyTsinghua UniversityBeijingChina

Personalised recommendations