Estimate Unlabeled-Data-Distribution for Semi-supervised PU Learning

  • Haoji Hu
  • Chaofeng Sha
  • Xiaoling Wang
  • Aoying Zhou
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7235)


Traditional supervised classifiers use only labeled data (features/label pairs) as the training set, while the unlabeled data is used as the testing set. In practice, it is often the case that the labeled data is hard to obtain and the unlabeled data contains the instances that belong to the predefined class beyond the labeled data categories. This problem has been widely studied in recent years and the semi-supervised learning is an efficient solution to learn from positive and unlabeled examples(or PU learning). Among all the semi-supervised PU learning methods, it’s hard to choose just one approach to fit all unlabeled data distribution. This paper proposes an automatic KL-divergence based semi-supervised learning method by using unlabeled data distribution knowledge. Meanwhile, a new framework is designed to integrate different semi-supervised PU learning algorithms in order to take advantage of the former methods. The experimental results show that (1)data distribution information is very helpful for the semi-supervised PU learning method; (2)the proposed framework can achieve higher precision when compared with the-state-of-the-art method.


Unlabeled Data Positive Instance Negative Instance Class Proportion Reuter Corpus 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Manevitz, L.M., Yousef, M., Cristianini, N., Shawe-taylor, J., Williamson, B.: One class svms for document classification. Journal of Machine Learning Research 2, 139–154 (2001)Google Scholar
  2. 2.
    Yu, H., Han, J., Chang, K.C.C.: Pebl: Positive example based learning for web page classification using svm. In: KDD (2002)Google Scholar
  3. 3.
    Li, X., Liu, B., Dai, Y., Lee, W.S., Yu, P.S.: Building Text Classifiers Using Positive and Unlabeled Examples. In: ICDM (2003)Google Scholar
  4. 4.
    Denis, F.: PAC Learning from Positive Statistical Queries. In: Richter, M.M., Smith, C.H., Wiehagen, R., Zeugmann, T. (eds.) ALT 1998. LNCS (LNAI), vol. 1501, pp. 112–126. Springer, Heidelberg (1998)CrossRefGoogle Scholar
  5. 5.
    Cortes, C., Vapnik, V.: Support vector networks. Machine Learning 20, 273–297 (1995)zbMATHGoogle Scholar
  6. 6.
    Lee, W.S., Liu, B.: Learning with positive and unlabeled examples using weighted logistic regression. In: Proceedings of the Twentieth International Conference on Machine Learning (2003)Google Scholar
  7. 7.
    Liu, Z., Shi, W., Li, D., Qin, Q.: Partially Supervised Classification – Based on Weighted Unlabeled Samples Support Vector Machine. In: Li, X., Wang, S., Dong, Z.Y. (eds.) ADMA 2005. LNCS (LNAI), vol. 3584, pp. 118–129. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  8. 8.
    Zhang, D., Lee, W.S.: A simple probabilistic approach to learning from positive and unlabeled examples. In: UKCI (2005)Google Scholar
  9. 9.
    Elkan, C., Noto, K.: Learing classifiers from only positive and unlabeled data. In: KDD (2008)Google Scholar
  10. 10.
    Rosenberg, C., Hebert, M., Schneiderman, H.: Semi-supervised selftraining of object detection models. In: Seventh IEEE Workshop on Applications of Computer Vision (2005)Google Scholar
  11. 11.
    Zhu, X., Ghahramani, Z., Lafferty, J.: Semi-supervised learning using Gaussian fields and harmonic functions. In: The 20th International Conference on Machine Learning (2003a)Google Scholar
  12. 12.
    Vapnik, V.: Statistical learning theory. Wiley-Interscience (1998)Google Scholar
  13. 13.
    Li, X.L., Liu, B., Ng, S.K.: Learning to Identify Unexpected Instances in the Test Set. In: AAAI (2007)Google Scholar
  14. 14.
    Zhu, X.J.: Semi-Supervised Learning Literature Survey. Technical Report 1530, Dept. Comp. Sci., Univ. Wisconsin-Madison (2006)Google Scholar
  15. 15.
    Wang, X., Xu, Z., Sha, C., Ester, M., Zhou, A.: Semi-supervised Learning from Only Positive and Unlabeled Data Using Entropy. In: Chen, L., Tang, C., Yang, J., Gao, Y. (eds.) WAIM 2010. LNCS, vol. 6184, pp. 668–679. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  16. 16.
    Cover, T., Thomas, J.: Elements of Information Theory. Wiley Interscience, Hoboken (1991)zbMATHCrossRefGoogle Scholar
  17. 17.
    Xu, Z., Sha, C.F., Wang, X.L., Zhou, A.Y.: Semi-supervised Classification Based on KL Divergence. Journal of Computer Research and Development 1, 81–87 (2010)Google Scholar
  18. 18.
    Bennett, K., Demiriz, A.: Semi-supervised support vector machines. In: NIPS 11, pp. 368–374 (1999)Google Scholar
  19. 19.
  20. 20.

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Haoji Hu
    • 1
  • Chaofeng Sha
    • 2
  • Xiaoling Wang
    • 1
  • Aoying Zhou
    • 1
    • 2
  1. 1.Shanghai Key Laboratory of Trustworthy Computing, Software Engineering InstituteEast China Normal UniversityChina
  2. 2.Shanghai Key Laboratory of Intelligent Information ProcessingFudan UniversityChina

Personalised recommendations