A Modified Cop-Kmeans Algorithm Based on Sequenced Cannot-Link Set

  • Tonny Rutayisire
  • Yan Yang
  • Chao Lin
  • Jinyuan Zhang
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6954)

Abstract

Clustering with instance-level constraints has received much attention in the clustering community recently. Particularly, must-Link and cannot-Link constraints between a given pair of instances in the data set are common prior knowledge incorporated in many clustering algorithms today. This approach has been shown to be successful in guiding a number of famous clustering algorithms towards more accurate results. However, recent work has also shown that incorporation of must-link and cannot-link constraints makes clustering algorithms too much sensitive to ”assignment order of instances” and therefore results in consequent constraint-violation. In this paper, we propose a modified version of Cop-Kmeans which relies on a sequenced assignment of cannot-linked instances. In comparison with original Cop-Kmeans, experiments on four UCI data sets indicate that our method could effectively overcome the problem of ”constraint-violation”, yet with almost the same performance as that of Cop-Kmeans algorithm.

Keywords

Semi-supervised clustering Constraints CLC-Kmeans Constrained clustering 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Davidson, I., Basu, S.: A survey of clustering with instance level constraints. ACM Transactions on Knowledge Discovery on Data, 1–41 (2007)Google Scholar
  2. 2.
    Wagstaff, K., Cardie, C., Rogers, S., Schroedl, S.: Constrained K-means clustering with background knowledge. In: Proceedings of International Conference on Machine Learning, pp. 577–584 (2001)Google Scholar
  3. 3.
    Wagstaff, K., Cardie, C.: Clustering with instance level constraints. In: Proceedings of the International Conference on Machine Learning, pp. 1103–1110 (2000)Google Scholar
  4. 4.
    Davidson, I., Ravi, S.S.: Clustering with constraints: Feasibility issues and the k-means algorithm. In: Proceedings of SIAM International Conference on Data Mining, pp. 138–149 (2005)Google Scholar
  5. 5.
    Basu, S., Banerjee, A., Mooney, R.J.: Active semi-supervision for pairwise constrained clustering. In: Proceedings of the SIAM International Conference on Data Mining, pp. 333–344 (2004)Google Scholar
  6. 6.
    Wagstaff, K.: Intelligent clustering with instance-level constraints. Cornell University (2002)Google Scholar
  7. 7.
    Tan, W., Yang, Y., Li, T.: An improved COP-KMeans algorithm for solving constraint violation. In: Proceedings of the International FLINS Conference on Foundations and Applications of Computational intelligence, pp. 690–696 (2010)Google Scholar
  8. 8.
    Anthony, K., Han, J., Raymond, T.: Constraint-Based Clustering in large Databases. In: Proceedings of International Conference on Database Theory, pp. 405–419 (2001)Google Scholar
  9. 9.
    Haichao, H., Yong, C., Ruilian, Z.: A semi-supervised clustering algorithm based on must-link set. In: Proceedings of International Conference on Advanced Data Mining & Applications, pp. 492–499 (2008)Google Scholar
  10. 10.
    Davidson, I., Ravi, S.S.: Identifying and Generating easy sets of constraints for clustering. In: Proceedings of American Association for Artificial Intelligence, pp. 336–341 (2006)Google Scholar
  11. 11.
    West, D.B.: Introduction to Graph Theory. Prentice Hall, Inc., Englewood Cliffs (2001)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Tonny Rutayisire
    • 1
  • Yan Yang
    • 1
  • Chao Lin
    • 1
  • Jinyuan Zhang
    • 1
  1. 1.School of Information Science & TechnologySouthwest Jiaotong UniversityChengduP.R. China

Personalised recommendations