A Modified Cop-Kmeans Algorithm Based on Sequenced Cannot-Link Set
Clustering with instance-level constraints has received much attention in the clustering community recently. Particularly, must-Link and cannot-Link constraints between a given pair of instances in the data set are common prior knowledge incorporated in many clustering algorithms today. This approach has been shown to be successful in guiding a number of famous clustering algorithms towards more accurate results. However, recent work has also shown that incorporation of must-link and cannot-link constraints makes clustering algorithms too much sensitive to ”assignment order of instances” and therefore results in consequent constraint-violation. In this paper, we propose a modified version of Cop-Kmeans which relies on a sequenced assignment of cannot-linked instances. In comparison with original Cop-Kmeans, experiments on four UCI data sets indicate that our method could effectively overcome the problem of ”constraint-violation”, yet with almost the same performance as that of Cop-Kmeans algorithm.
KeywordsSemi-supervised clustering Constraints CLC-Kmeans Constrained clustering
Unable to display preview. Download preview PDF.
- 1.Davidson, I., Basu, S.: A survey of clustering with instance level constraints. ACM Transactions on Knowledge Discovery on Data, 1–41 (2007)Google Scholar
- 2.Wagstaff, K., Cardie, C., Rogers, S., Schroedl, S.: Constrained K-means clustering with background knowledge. In: Proceedings of International Conference on Machine Learning, pp. 577–584 (2001)Google Scholar
- 3.Wagstaff, K., Cardie, C.: Clustering with instance level constraints. In: Proceedings of the International Conference on Machine Learning, pp. 1103–1110 (2000)Google Scholar
- 4.Davidson, I., Ravi, S.S.: Clustering with constraints: Feasibility issues and the k-means algorithm. In: Proceedings of SIAM International Conference on Data Mining, pp. 138–149 (2005)Google Scholar
- 5.Basu, S., Banerjee, A., Mooney, R.J.: Active semi-supervision for pairwise constrained clustering. In: Proceedings of the SIAM International Conference on Data Mining, pp. 333–344 (2004)Google Scholar
- 6.Wagstaff, K.: Intelligent clustering with instance-level constraints. Cornell University (2002)Google Scholar
- 7.Tan, W., Yang, Y., Li, T.: An improved COP-KMeans algorithm for solving constraint violation. In: Proceedings of the International FLINS Conference on Foundations and Applications of Computational intelligence, pp. 690–696 (2010)Google Scholar
- 8.Anthony, K., Han, J., Raymond, T.: Constraint-Based Clustering in large Databases. In: Proceedings of International Conference on Database Theory, pp. 405–419 (2001)Google Scholar
- 9.Haichao, H., Yong, C., Ruilian, Z.: A semi-supervised clustering algorithm based on must-link set. In: Proceedings of International Conference on Advanced Data Mining & Applications, pp. 492–499 (2008)Google Scholar
- 10.Davidson, I., Ravi, S.S.: Identifying and Generating easy sets of constraints for clustering. In: Proceedings of American Association for Artificial Intelligence, pp. 336–341 (2006)Google Scholar
- 11.West, D.B.: Introduction to Graph Theory. Prentice Hall, Inc., Englewood Cliffs (2001)Google Scholar