Abstract
This paper studies the problem of Positive Unlabeled learning (PU learning), where positive and unlabeled examples are used for training. Naive Bayes (NB) and Tree Augmented Naive Bayes (TAN) have been extended to PU learning algorithms (PNB and PTAN). However, they require user-specified parameter, which is difficult for the user to provide in practice. We estimate this parameter following [2] by taking the “selected completely at random” assumption and reformulate these two algorithms with this assumption. Furthermore, based on supervised algorithms Averaged One-Dependence Estimators (AODE), Hidden Naive Bayes (HNB) and Full Bayesian network Classifier (FBC), we extend these algorithms to PU learning algorithms (PAODE, PHNB and PFBC respectively). Experimental results on 20 UCI datasets show that the performance of the Bayesian algorithms for PU learning are comparable to corresponding supervised ones in most cases. Additionally, PNB and PFBC are more robust against unlabeled data, and PFBC generally performs the best.
This work is supported by the National Natural Science Foundation of China (60873196) and Chinese Universities Scientific Fund (QN2009092).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Zhang, D., Lee, W.S.: A Simple Probabilistic Approach to Learning from Positive and Unlabeled Examples. In: Proc. of UKCI 2005, pp. 83–87 (2005)
Elkan, C., Noto, K.: Learning Classifiers from Only Positive and Unlabeled Data. In: Proc. of SIGKDD 2008, pp. 213–220 (2008)
Friedman, N., Geiger, D., Goldszmidt, M.: Bayesian network classifiers. Machine Learning 29(2), 131–163 (1997)
Webb, G.I., Boughton, J.R., Wang, Z.: Not So Naive Bayes: Aggregating One-Dependence Estimators. Machine Learning 58(1), 5–24 (2005)
Jiang, L., Zhang, H., Cai, Z.: A Novel Bayes Model: Hidden Naive Bayes. IEEE Transactions on Knowledge and Data Engineering 21(10), 1361–1371 (2009)
Su, J., Zhang, H.: Full Bayesian Network Classifiers. In: Proc. of the 23rd ICML, pp. 897–904 (2006)
Schölkopf, B., Platt, J., Shawe-Taylor, J., Smola, A., Williamson, R.: Estimating the Support of a High-Dimensional Distribution. Neural Computation 13(7), 1443–1471 (2001)
Yu, H., Han, J., Chang, K.C.: PEBL: Positive Example Based Learning for Web Page Classification Using SVM. In: Proc. of the 8th SIGKDD, pp. 239–248 (2002)
Liu, B., Lee, W.S., Yu, P.S., Li, X.: Partially Supervised Classification of Text Documents. In: Proc. of the 9th ICML, pp. 387–394 (2002)
Li, X., Liu, B.: Learning to Classify Texts Using Positive and Unlabeled Data. In: Proc. of the 18th IJCAI, pp. 587–592 (2003)
Lee, W.S., Liu, B.: Learning with Positive and Unlabeled Examples Using Weighted Logistic Regression. In: Proc. of the 3rd ICDE, pp. 448–455 (2003)
Liu, B., Dai, Y., Li, X., Lee, W.S., Yu, P.S.: BuildingText Classifiers Using Positive and Unlabeled Examples. In: Proc. of the 3rd ICDM, pp. 179–186 (2003)
Denis, F., Gilleron, R., Tommasi, M.: Text Classification from Positive and Unlabeled Examples. In: Proc. of the 9th IPMU, pp. 1927–1934 (2002)
Denis, F., Gilleron, R., Letouzey, F.: Learning from Positive and Unlabeled Examples. Theoretical Computer Science 38(1), 70–83 (2005)
Zhang, Y., Li, X., Orlowska, M.: One-Class Classification of Text Streams with Concept Drift. In: Proc. of ICDMW, pp. 116–125 (2008)
Li, X.L., Yu, P.S., Liu, B., Ng, S.K.: Positive Unlabeled Learning for Data Stream Classification. In: Proc. of the 9th SIAM SDM, pp. 257–268 (2009)
He, J., Zhang, Y., Li, X., Wang, Y.: Naive Bayes Classifier for Positive Unlabeled Learning with Uncertainty. In: Proc. of the 10th SIAM SDM, pp. 361–372 (2010)
Calvo, B., Larranaga, P., Lozano, J.A.: Learning Bayesian Classifiers from Positive and Unlabeled Examples. Pattern Recognition Letters 28(16), 2375–2384 (2007)
Zadrozny, B., Elkan, C.: Transforming Classifier Scores into Accurate Multiclass Probability Estimates. In: Proc. of the 8th SIGKDD, pp. 694–699 (2002)
Blake, C.L., Merz, C.J.: UCI repository of machine learning databases, http://www.ics.uci.edu/~mlearn/MLRepository.html
Zhang, H., Jiang, L., Su, J.: Augmenting Naive Bayes for Ranking. In: Proc. of the 22nd ICML, pp. 1020–1027 (2005)
Zhang, H., Jiang, L., Su, J.: Learning Weighted Naive Bayes with Accurate Ranking. In: Proc. of the 4th ICDM, pp. 567–570 (2004)
Su, J., Zhang, H.: Learning Conditional Independence Tree for Ranking. In: Proc. of the 4th ICDM, pp. 531–534 (2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
He, J., Zhang, Y., Li, X., Wang, Y. (2011). Bayesian Classifiers for Positive Unlabeled Learning. In: Wang, H., Li, S., Oyama, S., Hu, X., Qian, T. (eds) Web-Age Information Management. WAIM 2011. Lecture Notes in Computer Science, vol 6897. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23535-1_9
Download citation
DOI: https://doi.org/10.1007/978-3-642-23535-1_9
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-23534-4
Online ISBN: 978-3-642-23535-1
eBook Packages: Computer ScienceComputer Science (R0)