Skip to main content

A Novel Initialization Method for Semi-supervised Clustering

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6291))

Abstract

In recent years, the research of semi-supervised clustering has been paid more and more attention. For most of the semi-supervised clustering algorithms, a good initialization method can create the high-quality seeds which are helpful to improve the clustering accuracy. In the real world, there are few labeled samples but many unlabeled ones, whereas most of the existing initialization methods put the unlabeled data away for clustering which may contain some potentially useful information for clustering tasks. In this paper, we propose a novel initialization method to transfer some of the unlabeled samples into labeled ones, in which the neighbors of labeled samples are identified at first and then the known labels are propagated to the unlabeled ones. Experimental results show that the proposed initialization method can improve the performance of the semi-supervised clustering.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Zhou, Z.H., Zhang, D.C., Yang, Q.: Semi-Supervised Learning with Very Few Labeled Training Examples. In: 22nd AAAI Conference on Artificial Intelligence, pp. 675–680. AAAI Press, Vancouver (2007)

    Google Scholar 

  2. Basu, S., Banerjee, A., Mooney, R.: Semi-supervised Clustering by Seeding. In: 19th International Conference Machine Learning, pp. 19–26. Morgan Kaufmann Press, Sydney (2002)

    Google Scholar 

  3. Zhong, S.: Semi-supervised Model-based Document Clustering: A Comparative Study. J. Mach. Learn. 65, 3–29 (2006)

    Article  Google Scholar 

  4. Katsavounidis, I., Kuo, C., Zhang, Z.: A New Initialization Technique for Generalized Lloyd Iteration. J. Sig. Proc. Lett. 1, 144–146 (1994)

    Article  Google Scholar 

  5. Sun, X., Li, K.L., Zhao, R.: Global optimization for semi-supervised K-means. In: Asia-Pacific Conference on Information Processing, pp. 410–413. IEEE Press, Shen Zhen (2009)

    Chapter  Google Scholar 

  6. Zhu, X.J., Ghahramani, Z.: Learning from Labeled and Unlabeled Data with Label Propagation. Technical Report CMU-CALD-02-107, Carnegie Mellon Univ. (2002)

    Google Scholar 

  7. Zhong, S., Ghosh, J.: A unified framework for model-based clustering. J. Mach. Learn. Resear. 4, 1001–1037 (2003)

    Article  MathSciNet  Google Scholar 

  8. He, J., Lan, M., Tan, C.L., Sung, S.Y., Low, H.B.: Initialization of Cluster Refinement Algorithms: A Review and Comparative Study. In: IEEE International Joint Conference Neural Networks, pp. 297–302. IEEE Press, Budapest (2004)

    Google Scholar 

  9. Luo, C., Li, Y.J., Chung, S.M.: Text Document Clustering Based on Neighbors. J. Data & Kno. Engin. 68, 1271–1288 (2009)

    Article  Google Scholar 

  10. Nigam, K., Mccallum, A., Thrun, S., Mitchell, T.: Text classification from labeled and unlabeled documents using EM. J. Mach. Learn. 39, 103–134 (2000)

    Article  MATH  Google Scholar 

  11. Nigam, K.: Using Unlabeled Data to Improve Text Classification. Doctoral Dissertation, School of Computer Science, Carnegie Mellon University (2001)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Dang, Y., Xuan, Z., Rong, L., Liu, M. (2010). A Novel Initialization Method for Semi-supervised Clustering. In: Bi, Y., Williams, MA. (eds) Knowledge Science, Engineering and Management. KSEM 2010. Lecture Notes in Computer Science(), vol 6291. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15280-1_30

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-15280-1_30

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-15279-5

  • Online ISBN: 978-3-642-15280-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics