Abstract
Cluster discovery is an essential part of many data mining applications. While cluster discovery process is mainly unsupervised in nature, it can often be aided by a small amount of labeled data. A probabilistic model on the clustering structure is adopted and a novel unified energy equation for clustering that incorporates both labeled data and unlabeled data is introduced. This formulation is inspired by a force-field model integrating labeling constraint on labeled data and similarity information on unlabeled data for joint estimation. Experimental results show that good clusters can be identified using small amount of labeled data.
Similar content being viewed by others
References
B.D. Ripley, Pattern Recognition and Neural Networks, Cambridge University Press: Cambridge, UK, 1996.
E. Forgy, “Cluster analysis of multivariate data: Efficiency vs. interpretablility of classifications,” Biometrics,vol. 21, pp. 768, 1965.
J. MacQueen, “On convergence of k-means and partitions with minimum average variance,” Ann. Math. Statist.,vol. 36, pp. 1084, 1965.
J. Puzicha, T. Hofmann, and J.M. Buhmann, “A theory of proximity based clustering: Structure detection by optimization,” Pattern Recognition,vol. 33, pp. 617–634, 2000.
G. Karpis and E.-H. Han, “Chameleon: Hierachical clustering using dynamic modeling,” IEEE Computer, pp. 68–75, 1999.
R. Kannan, S. Vempala, and A. Veta, “On clusterings-good, bad and spectral,” in Proceedings of the 41st Annual Symposium on Foundations of Computer Science, 2000.
A. Blum and Shuchi Chawla, “Combining labeled and unlabeled data with co-training,” in The Eighteenth International Conference on Machine Learning, 2001.
S. Goldman and Y. Zhou, “Enhancing supervised learning with unlabeled data,” in Proceedings of the Seventeenth International Conference on Machine Learning, 2000.
K. Nigam, A. McCallum, Sebastian Thrun, and Tom Mitchell, “Text classification from labeled and unlabeled documents using em,” Machine Learning,vol. 34, no. 1, 1999.
T.S. Chiang and Y. Chow, “Optimization approaches to semi-supervised learning,” in Applications and Algorithms of Com-plementarity, edited by M.C. Ferris, O.L. Mangasarian, and J.S. Pang, Kluwer Academic Publishers, 2000.
Ross Kindermann, Markov Random Fields and Their Applications, American Mathematical Society: Providence, R.I., 1980.
M. Nadler and E.P. Smith, Pattern Recognition Engineering, Wiley-interscience: New York, 1993.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Li, C. Guided Cluster Discovery with Markov Model. Applied Intelligence 22, 37–46 (2005). https://doi.org/10.1023/B:APIN.0000047382.74353.8f
Issue Date:
DOI: https://doi.org/10.1023/B:APIN.0000047382.74353.8f