Labeled samples are crucial in semi-supervised classification, but which samples should we choose to be the labeled samples? In other words, which samples, if labeled, would provide the most information? We propose a method to solve this problem. First, we give each unlabeled examples an initial class label using unsupervised learning. Then, by maximizing the mutual information, we choose the samples with most information to be user-specified labeled samples. After that, we run semi-supervised algorithm with the user-specified labeled samples to get the final classification. Experimental results on synthetic data show that our algorithm can get a satisfying classification results with active query selection.


Mutual Information Class Label Synthetic Data Unsupervised Learning Unlabeled Data 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Blum, A., Mitchell, T.: Combining labeled and unlabeled data with co-training. In: Proceedings of the 11th Annual Conference on Computational Learning Theory, Madison, WI, pp. 92–100 (1998)Google Scholar
  2. 2.
    Blum, A., Chawla, S.: Learning from Labeled and Unlabeled Data using Graph Mincuts. In: ICML (2001)Google Scholar
  3. 3.
    Belkin, M., Niyogi, P.: Laplacian Eigenmaps for Dimensionality Reduction and Data Representation. Neural Computation (June 2003)Google Scholar
  4. 4.
    Belkin, M., Niyogi, P., Sindhwani, V.: On Manifold Regularization. Department of Computer Science, University of Chicago, TR-2004-05Google Scholar
  5. 5.
    Krishnapuram, B., Williams, D., Xue, Y., Hartemink, A., Carin, L., Figueiredo, M.A.T.: On Semi-Supervised Classification. In: NIPS (2004)Google Scholar
  6. 6.
    Zhou, D., Bousquet, O., Lal, T.N., Weston, J., Schoelkopf, B.: Learning with Local and Global Consistency. In: NIPS (2003)Google Scholar
  7. 7.
    Freund, Y., Seung, H.S., Shamir, E., Tishby, N.: Selective sampling using the query by committee algorithm. Machine Learning 28, 133–168 (1997)MATHCrossRefGoogle Scholar
  8. 8.
    Nigam, K.: Using Unlabeled Data to Improve Text Classification. PhD thesis, Carnegie Mellon University Computer Science Dept. (2001)Google Scholar
  9. 9.
    Szummer, M., Jaakkola, T.: Partially labeled classification with markov random walks. In: NIPS (2001)Google Scholar
  10. 10.
    Tong, S., Koller, D.: Support vector machine active learning with applications to text classification. In: ICML (2000)Google Scholar
  11. 11.
    Zhu, X., Lafferty, J., Ghahramani, Z.: Combining active learning and semi-supervised learning using Gaussian fields and harmonic functions. In: ICML (2003)Google Scholar
  12. 12.
    Zhu, X., Ghahramani, Z., Lafferty, J.: Semi-supervised learning using Gaussian fields and harmonic functions. In: ICML (2003)Google Scholar
  13. 13.
    Zhu, X.: Semi-Supervised Learning with Graphs. PhD thesis, Carnegie Mellon University Computer Science Dept. (2005)Google Scholar
  14. 14.
    Zhou, Z.-H., Li, M.: Semi-supervised regression with co-training. In: International Joint Conference on Artificial Intelligence (2005)Google Scholar
  15. 15.
    Zhou, Z.-H., Li, M.: Tri-training: exploiting unlabeled data using three classifiers. IEEE Trans. Knowledge and Data Engineering 17, 1529–1541 (2005)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Jiao Wang
    • 1
  • Siwei Luo
    • 1
  1. 1.School of Computer and Information TechnologyBeijing Jiaotong UniversityBeijingChina

Personalised recommendations