World Wide Web

, Volume 17, Issue 4, pp 493–510 | Cite as

A unified framework for semi-supervised PU learning

  • Haoji Hu
  • Chaofeng Sha
  • Xiaoling WangEmail author
  • Aoying Zhou


Traditional supervised classifiers use only labeled data (features/label pairs) as the training set, while the unlabeled data is used as the testing set. In practice, it is often the case that the labeled data is hard to obtain and the unlabeled data contains the instances that belong to the predefined class but not the labeled data categories. This problem has been widely studied in recent years and the semi-supervised PU learning is an efficient solution to learn from positive and unlabeled examples. Among all the semi-supervised PU learning methods, it is hard to choose just one approach to fit all unlabeled data distribution. In this paper, a new framework is designed to integrate different semi-supervised PU learning algorithms in order to take advantage of existing methods. In essence, we propose an automatic KL-divergence learning method by utilizing the knowledge of unlabeled data distribution. Meanwhile, the experimental results show that (1) data distribution information is very helpful for the semi-supervised PU learning method; (2) the proposed framework can achieve higher precision when compared with the state-of-the-art method.


Data mining Semi-supervised learning PU learning 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Bennett, K., Demiriz, A.: Semi-supervised support vector machines. In: NIPS 11, pp. 368–374 (1999)Google Scholar
  2. 2.
    Cortes, C., Vapnik, V.: Support vector networks. Mach. Learn. 20, 273–297 (1995)zbMATHGoogle Scholar
  3. 3.
    Cover, T.M., Thomas, J.A.: Elements of Information Theory. Wiley Interscience, Hoboken (1991)CrossRefzbMATHGoogle Scholar
  4. 4.
    Denis, F.: PAC learning from positive statistical queries. In: ALT 1998, LNCS (LNAI), vol. 1501, pp. 112–126. Springer, Heidelberg (1998)Google Scholar
  5. 5.
    Denis, F., Gilleron, R., Tommasi, M.: Text classification from positive and unlabeled examples. In: IPMU (2002)Google Scholar
  6. 6.
    Elkan, C., Noto, K.: Learing classifiers from only positive and unlabeled data. In: KDD (2008)Google Scholar
  7. 7.
    Lee, W.S., Liu, B.: Learning with positive and unlabeled examples using weighted logistic regression. In: Proceedings of the 20th International Conference on Machine Learning (2003)Google Scholar
  8. 8.
    Li, X.L., Liu, B.: Learning from positive and unlabeled examples with different data distributions. In: ECML (2005)Google Scholar
  9. 9.
    Li, X.L., Liu, B., Ng, S.K.: Learning to identify unexpected instances in the test set. In: AAAI (2007)Google Scholar
  10. 10.
    Li, X.L., Liu, B., Ng, S.K.: Negative training data can be harmful to text classification. In: EMNLP (2010)Google Scholar
  11. 11.
    Liu, B., Dai, Y., Li, X.L., Lee, W.S., Yu, P.S.: Building text classifiers using positive and unlabeled examples. In: ICDM (2003)Google Scholar
  12. 12.
    Liu, Z., Shi, W., Li, D., Qin, Q.: Partially supervised classification—based on weighted unlabeled samples support vector machine. In: Proceedings of the 1st International Conference on Advanced Data Mining and Applications (2005)Google Scholar
  13. 13.
    Manevitz, L.M., Yousef, M.: One class svms for document classification. J. Mach. Learn. Res. 2, 139–154 (2002)zbMATHGoogle Scholar
  14. 14.
    Rosenberg, C., Hebert, M., Schneiderman, H.: Semi-supervised selftraining of object detection models. In: 7th IEEE Workshop on Applications of Computer Vision (2005)Google Scholar
  15. 15.
    Wang, X.L., Xu, Z., Sha, C.F., Ester, M., Zhou, A.Y.: Semi-supervised learning from only positive and unlabeled data using entropy. In: WAIM (2010)Google Scholar
  16. 16.
    Xu, Z., Sha, C.F., Wang, X.L., Zhou, A.Y.: Semi-supervised classification based on KL divergence. J. Comput. Res. Dev. 1, 81–87 (2010)Google Scholar
  17. 17.
    Yu, H., Han, J., Chang, K.C.C.: Pebl: positive example based learning for web page classification using svm. In: KDD (2002)Google Scholar
  18. 18.
    Zhang, D., Lee, W.S.: A simple probabilistic approach to learning from positive and unlabeled examples. In: UKCI (2005)Google Scholar
  19. 19.
    Zhang, X.H., Lee, W.S.: Hyperparameter learning for graph based semi-supervised learning algorithms. In: NIPS (2006)Google Scholar
  20. 20.
    Zhou, D.Y., Huang, J.Y., Schölkopf, B.: Learning from labeled and unlabeled data on a directed graph. In: ICML (2005)Google Scholar
  21. 21.
    Zhou, Z.H., Li, M.: Semisupervised regression with co-training style algorithms. In: TKDE (2007)Google Scholar
  22. 22.
    Zhu, X., Ghahramani, Z., Lafferty, J.: Semi-supervised learning using Gaussian fields and harmonic functions. In: the 20th International Conference on Machine Learning (2003)Google Scholar
  23. 23.
    Zhu, X.J.: Semi-supervised learning literature survey. In: Technical Report 1530, Dept. Comp. Sci., Univ. Wisconsin-Madison (2006)Google Scholar
  24. 24.
  25. 25.

Copyright information

© Springer Science+Business Media New York 2013

Authors and Affiliations

  • Haoji Hu
    • 1
  • Chaofeng Sha
    • 2
  • Xiaoling Wang
    • 1
    Email author
  • Aoying Zhou
    • 1
  1. 1.Shanghai Key Laboratory of Trustworthy ComputingEast China Normal UniversityShanghaiChina
  2. 2.Shanghai Key Laboratory of Intelligent Information ProcessingFudan UniversityShanghaiChina

Personalised recommendations