Several clustering algorithms equipped with pairwise hard constraints between data points are known to improve the accuracy of clustering solutions. We develop a new clustering algorithm that extends mixture clustering in the presence of (i) soft constraints, and (ii) group-level constraints. Soft constraints can reflect the uncertainty associated with a priori knowledge about pairs of points that should or should not belong to the same cluster, while group-level constraints can capture larger building blocks of the target partition when afforded by the side information. Assuming that the data points are generated by a mixture of Gaussians, we derive the EM algorithm to estimate the parameters of different clusters. Empirical study demonstrates that the use of soft constraints results in superior data partitions normally unattainable without constraints. Further, the solutions are more robust when the hard constraints may be incorrect.


Cluster Solution Soft Constraint Hard Constraint Cluster Label Pairwise Constraint 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Jain, A.K., Dubes, R.C.: Algorithms for Clustering Data. Prentice-Hall, Englewood Cliffs (1988)zbMATHGoogle Scholar
  2. 2.
    Yu, S.X., Shi, J.: Segmentation given partial grouping constraints. IEEE Transactions on Pattern Analysis and Machine Intelligence 26, 173–183 (2004)CrossRefGoogle Scholar
  3. 3.
    Bar-Hillel, A., Hertz, T., Shental, N., Weinshall, D.: Learning via equivalence constraints, with applications to the enhancement of image and video retrieval. In: Proc. IEEE Confernce on Computer Vision and Pattern Recognition (2003)Google Scholar
  4. 4.
    Wagstaff, K., Cardie, C., Rogers, S., Schroedl, S.: Constrained k-means clustering with background knowledge. In: Proc. International Conference on Machine Learning, pp. 577–584 (2001)Google Scholar
  5. 5.
    Wagstaff, K., Cardie, C.: Clustering with instance-level constraints. In: Proc. International Conference on Machine Learning, pp. 1103–1110 (2000)Google Scholar
  6. 6.
    Wagstaff, K.: Intelligent Clustering with Instance-Level Constraints. PhD thesis, Department of Computer Science, Cornell University (2002)Google Scholar
  7. 7.
    Klein, D., Kamvar, S.D., Manning, C.D.: From instance-level constraints to spacelevel constraints: Making the most of prior knowledge in data clustering. In: Proc. International Conference on Machine Learning, pp. 307–314 (2002)Google Scholar
  8. 8.
    Kamvar, S., Klein, D., Manning, C.D.: Spectral learning. In: Proc. of the Eighteenth International Joint Conference on Artificial Intelligence, MIT Press, Cambridge (2003)Google Scholar
  9. 9.
    Shental, N., Bar-Hillel, A., Hertz, T., Weinshall, D.: Computing gaussian mixture models with EM using equivalence constraints. In: Advances in Neural Information Processing Systems 16, MIT Press, Cambridge (2004)Google Scholar
  10. 10.
    Yu, S.X., Shi, J.: Grouping with bias. In: Advances in Neural Information Processing Systems 13, MIT Press, Cambridge (2001)Google Scholar
  11. 11.
    Xing, E.P., Ng, A.Y., Jordan, M.I., Russell, S.: Distance metric learning, with application to clustering with side-information. In: Advances in Neural Information Processing Systems 15, Cambridge, MA, MIT Press, Cambridge (2003)Google Scholar
  12. 12.
    Bansal, N., Blum, A., Chawla, S.: Correlation clustering. In: Proc. of the 43d Annual IEEE Symp. on Foundations of Computer Science (2002)Google Scholar
  13. 13.
    Charikar, M., Guruswami, V., Wirth, A.: Clustering with qualitative information. In: Proc. of the 44th Annual IEEE Symposium on Foundations of Computer Science (2003)Google Scholar
  14. 14.
    Demaine, E.D., Immorlica, N.: Correlation clustering with partial information. In: Proc. of the 6th International Workshop on Approximation Algorithms for Combinatorial Optimization Problems, Princeton, New Jersey (2003)Google Scholar
  15. 15.
    McLachlan, G., Peel, D.: Finite Mixture Models. John Wiley & Sons, New York (2000)zbMATHCrossRefGoogle Scholar
  16. 16.
    Figueiredo, M.A.T., Jain, A.K.: Unsupervised learning of finite mixture models. IEEE Transactions on Pattern Analysis and Machine Intelligence 24, 381–396 (2002)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2004

Authors and Affiliations

  • Martin H. C. Law
    • 1
  • Alexander Topchy
    • 1
  • Anil K. Jain
    • 1
  1. 1.Department of Computer Science and EngineeringMichigan State UniversityEast LansingUSA

Personalised recommendations