Constraint Selection by Committee: An Ensemble Approach to Identifying Informative Constraints for Semi-supervised Clustering

Greene, Derek; Cunningham, Pádraig

doi:10.1007/978-3-540-74958-5_16

Derek Greene¹ &
Pádraig Cunningham¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4701))

Included in the following conference series:

European Conference on Machine Learning

5800 Accesses
23 Citations

Abstract

A number of clustering algorithms have been proposed for use in tasks where a limited degree of supervision is available. This prior knowledge is frequently provided in the form of pairwise must-link and cannot-link constraints. While the incorporation of pairwise supervision has the potential to improve clustering accuracy, the composition and cardinality of the constraint sets can significantly impact upon the level of improvement. We demonstrate that it is often possible to correctly “guess” a large number of constraints without supervision from the co-associations between pairs of objects in an ensemble of clusterings. Along the same lines, we establish that constraints based on pairs with uncertain co-associations are particularly informative, if known. An evaluation on text data shows that this provides an effective criterion for identifying constraints, leading to a reduction in the level of supervision required to direct a clustering algorithm to an accurate solution.

Download to read the full chapter text

Chapter PDF

Incremental Constrained Clustering: A Decision Theoretic Approach

Constrained Clustering: Current and New Trends

Constraint-based clustering selection

Article 05 June 2017

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Chapelle, O., Schölkopf, B., Zien, A. (eds.): Semi-Supervised Learning. MIT Press, Cambridge (2006)
Google Scholar
Basu, S., Banerjee, A., Mooney, R.: Active semi-supervision for pairwise constrained clustering. In: Proc. 4th SIAM Int. Conf. Data Mining, pp. 333–344 (2004)
Google Scholar
Basu, S., Bilenko, M., Mooney, R.J.: A probabilistic framework for semi-supervised clustering. In: Proc. 10th ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pp. 59–68. ACM Press, New York (2004)
Chapter Google Scholar
Strehl, A., Ghosh, J.: Cluster ensembles - a knowledge reuse framework for combining multiple partitions. J. Machine Learning Research 3, 583–617 (2002)
Article MathSciNet Google Scholar
Fred, A.: Finding consistent clusters in data partitions. In: Kittler, J., Roli, F. (eds.) MCS 2001. LNCS, vol. 2096, pp. 309–318. Springer, Heidelberg (2001)
Chapter Google Scholar
Weston, J., Leslie, C., Ie, E., Zhou, D., Elisseeff, A., Noble, W.: Semi-supervised protein classification using cluster kernels. Bioinformatics 21(15), 3241–3247 (2005)
Article Google Scholar
Lewis, D.D., Catlett, J.: Heterogeneous uncertainty sampling for supervised learning. In: Proc. 11th Int. Conf. Machine Learning, pp. 148–156 (1994)
Google Scholar
Seung, H.S., Opper, M., Sompolinsky, H.: Query by committee. In: Proc. 5th Workshop on Computational Learning Theory, pp. 287–294. Morgan Kaufmann, San Francisco (1992)
Chapter Google Scholar
Melville, P., Mooney, R.: Diverse ensembles for active learning. In: Proc. 21st Int. Conf. Machine Learning, pp. 584–591 (2004)
Google Scholar
Rousseeuw, P.: Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J. Computational and Applied Mathematics 20(1), 53–65 (1987)
Article MATH Google Scholar

Download references

Author information

Authors and Affiliations

University College Dublin, Ireland
Derek Greene & Pádraig Cunningham

Authors

Derek Greene
View author publications
You can also search for this author in PubMed Google Scholar
Pádraig Cunningham
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Joost N. Kok Jacek Koronacki Raomon Lopez de Mantaras Stan Matwin Dunja Mladenič Andrzej Skowron

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Greene, D., Cunningham, P. (2007). Constraint Selection by Committee: An Ensemble Approach to Identifying Informative Constraints for Semi-supervised Clustering. In: Kok, J.N., Koronacki, J., Mantaras, R.L.d., Matwin, S., Mladenič, D., Skowron, A. (eds) Machine Learning: ECML 2007. ECML 2007. Lecture Notes in Computer Science(), vol 4701. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74958-5_16

Download citation

DOI: https://doi.org/10.1007/978-3-540-74958-5_16
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-74957-8
Online ISBN: 978-3-540-74958-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Constraint Selection by Committee: An Ensemble Approach to Identifying Informative Constraints for Semi-supervised Clustering

Abstract

Chapter PDF

Similar content being viewed by others

Incremental Constrained Clustering: A Decision Theoretic Approach

Constrained Clustering: Current and New Trends

Constraint-based clustering selection

Keywords

References

Author information

Authors and Affiliations

Editor information

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Constraint Selection by Committee: An Ensemble Approach to Identifying Informative Constraints for Semi-supervised Clustering

Abstract

Chapter PDF

Similar content being viewed by others

Incremental Constrained Clustering: A Decision Theoretic Approach

Constrained Clustering: Current and New Trends

Constraint-based clustering selection

Keywords

References

Author information

Authors and Affiliations

Editor information

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation