Semi-supervised Constrained Clustering with Cluster Outlier Filtering

  • Cristián Bravo
  • Richard Weber
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7042)


Constrained clustering addresses the problem of creating minimum variance clusters with the added complexity that there is a set of constraints that must be fulfilled by the elements in the cluster. Research in this area has focused on “must-link” and “cannot-link” constraints, in which pairs of elements must be in the same or in different clusters, respectively. In this work we present a heuristic procedure to perform clustering in two classes when the restrictions affect all the elements of the two clusters in such a way that they depend on the elements present in the cluster. This problem is highly susceptible to outliers in each cluster (extreme values that create infeasible solutions), so the procedure eliminates elements with extreme values in both clusters, and achieves adequate performance measures at the same time. The experiments performed on a company database allow to discover a great deal of information, with results that are more readily interpretable when compared to classical k-means clustering.


Cluster Procedure Heuristic Procedure Pairwise Constraint Company Database Unsupervised Pattern Recognition 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Bard, J.F., Jarrah, A.: Large-scale constrained clustering for rationalizing pickup and delivery operations. Transportation Research Part B: Methodological 43(5), 542–561 (2009)CrossRefGoogle Scholar
  2. 2.
    Basu, S., Bilenko, M., Mooney, R.J.: A probabilistic framework for semi-supervised clustering. In: Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2004, pp. 59–68. ACM, New York (2004)CrossRefGoogle Scholar
  3. 3.
    Basu, S., Davidson, I., Wagstaff, K.: Constrained Clustering: Advances in Algorithms, Theory, and Applications. Chapman & Hall/CRC (2008)Google Scholar
  4. 4.
    Davidson, I., Ravi, S.S.: Clustering with constraints: Feasibility issues and the k-means algorithm. In: Proceedings of the SIAM International Conference on Data Mining (SDM 2005) (2005)Google Scholar
  5. 5.
    Dogan, H., Guzelis, C.: Gradient networks for clustering. In: Gknar, Ä.C., Sevgi, L. (eds.) Complex Computing-Networks. Springer Proceedings Physics, vol. 104, pp. 275–278. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  6. 6.
    Levy, M., Sandler, M.: Structural segmentation of musical audio by constrained clustering. IEEE Transactions on Audio, Speech, and Language Processing 16(2), 318–326 (2008)CrossRefGoogle Scholar
  7. 7.
    Patil, G., Modarres, R., Myers, W., Patankar, P.: Spatially constrained clustering and upper level set scan hotspot detection in surveillance geoinformatics. Environmental and Ecological Statistics 13, 365–377 (2006)MathSciNetCrossRefGoogle Scholar
  8. 8.
    Wagstaff, K., Cardie, C., Rogers, S., Schroedl, S.: Constrained k-means clustering with background knowledge. In: Proceedings of the Eighteenth International Conference on Machine Learning, pp. 577–584. Morgan Kaufmann (2001)Google Scholar
  9. 9.
    Xu, R., Wunsch, D.: Clustering. Wiley-IEEE Press (2008)Google Scholar
  10. 10.
    Zhao, W., He, Q., Ma, H., Shi, Z.: Effective semi-supervised document clustering via active learning with instance-level constraints. Knowledge and Information Systems 1, 1–19 (2011)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Cristián Bravo
    • 1
  • Richard Weber
    • 1
  1. 1.Department of Industrial EngineeringUniversidad de ChileChile

Personalised recommendations