Abstract
In recent years, the research of semi-supervised clustering has been paid more and more attention. For most of the semi-supervised clustering algorithms, a good initialization method can create the high-quality seeds which are helpful to improve the clustering accuracy. In the real world, there are few labeled samples but many unlabeled ones, whereas most of the existing initialization methods put the unlabeled data away for clustering which may contain some potentially useful information for clustering tasks. In this paper, we propose a novel initialization method to transfer some of the unlabeled samples into labeled ones, in which the neighbors of labeled samples are identified at first and then the known labels are propagated to the unlabeled ones. Experimental results show that the proposed initialization method can improve the performance of the semi-supervised clustering.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Zhou, Z.H., Zhang, D.C., Yang, Q.: Semi-Supervised Learning with Very Few Labeled Training Examples. In: 22nd AAAI Conference on Artificial Intelligence, pp. 675–680. AAAI Press, Vancouver (2007)
Basu, S., Banerjee, A., Mooney, R.: Semi-supervised Clustering by Seeding. In: 19th International Conference Machine Learning, pp. 19–26. Morgan Kaufmann Press, Sydney (2002)
Zhong, S.: Semi-supervised Model-based Document Clustering: A Comparative Study. J. Mach. Learn. 65, 3–29 (2006)
Katsavounidis, I., Kuo, C., Zhang, Z.: A New Initialization Technique for Generalized Lloyd Iteration. J. Sig. Proc. Lett. 1, 144–146 (1994)
Sun, X., Li, K.L., Zhao, R.: Global optimization for semi-supervised K-means. In: Asia-Pacific Conference on Information Processing, pp. 410–413. IEEE Press, Shen Zhen (2009)
Zhu, X.J., Ghahramani, Z.: Learning from Labeled and Unlabeled Data with Label Propagation. Technical Report CMU-CALD-02-107, Carnegie Mellon Univ. (2002)
Zhong, S., Ghosh, J.: A unified framework for model-based clustering. J. Mach. Learn. Resear. 4, 1001–1037 (2003)
He, J., Lan, M., Tan, C.L., Sung, S.Y., Low, H.B.: Initialization of Cluster Refinement Algorithms: A Review and Comparative Study. In: IEEE International Joint Conference Neural Networks, pp. 297–302. IEEE Press, Budapest (2004)
Luo, C., Li, Y.J., Chung, S.M.: Text Document Clustering Based on Neighbors. J. Data & Kno. Engin. 68, 1271–1288 (2009)
Nigam, K., Mccallum, A., Thrun, S., Mitchell, T.: Text classification from labeled and unlabeled documents using EM. J. Mach. Learn. 39, 103–134 (2000)
Nigam, K.: Using Unlabeled Data to Improve Text Classification. Doctoral Dissertation, School of Computer Science, Carnegie Mellon University (2001)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Dang, Y., Xuan, Z., Rong, L., Liu, M. (2010). A Novel Initialization Method for Semi-supervised Clustering. In: Bi, Y., Williams, MA. (eds) Knowledge Science, Engineering and Management. KSEM 2010. Lecture Notes in Computer Science(), vol 6291. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15280-1_30
Download citation
DOI: https://doi.org/10.1007/978-3-642-15280-1_30
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-15279-5
Online ISBN: 978-3-642-15280-1
eBook Packages: Computer ScienceComputer Science (R0)