Bayesian Classifiers for Positive Unlabeled Learning

He, Jiazhen; Zhang, Yang; Li, Xue; Wang, Yong

doi:10.1007/978-3-642-23535-1_9

Jiazhen He²¹,
Yang Zhang^21,22,
Xue Li²³ &
…
Yong Wang²⁴

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6897))

Included in the following conference series:

International Conference on Web-Age Information Management

1854 Accesses
5 Citations

Abstract

This paper studies the problem of Positive Unlabeled learning (PU learning), where positive and unlabeled examples are used for training. Naive Bayes (NB) and Tree Augmented Naive Bayes (TAN) have been extended to PU learning algorithms (PNB and PTAN). However, they require user-specified parameter, which is difficult for the user to provide in practice. We estimate this parameter following [2] by taking the “selected completely at random” assumption and reformulate these two algorithms with this assumption. Furthermore, based on supervised algorithms Averaged One-Dependence Estimators (AODE), Hidden Naive Bayes (HNB) and Full Bayesian network Classifier (FBC), we extend these algorithms to PU learning algorithms (PAODE, PHNB and PFBC respectively). Experimental results on 20 UCI datasets show that the performance of the Bayesian algorithms for PU learning are comparable to corresponding supervised ones in most cases. Additionally, PNB and PFBC are more robust against unlabeled data, and PFBC generally performs the best.

This work is supported by the National Natural Science Foundation of China (60873196) and Chinese Universities Scientific Fund (QN2009092).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 99.00; Price excludes VAT (USA)

Softcover Book: USD 129.00; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Zhang, D., Lee, W.S.: A Simple Probabilistic Approach to Learning from Positive and Unlabeled Examples. In: Proc. of UKCI 2005, pp. 83–87 (2005)
Google Scholar
Elkan, C., Noto, K.: Learning Classifiers from Only Positive and Unlabeled Data. In: Proc. of SIGKDD 2008, pp. 213–220 (2008)
Google Scholar
Friedman, N., Geiger, D., Goldszmidt, M.: Bayesian network classifiers. Machine Learning 29(2), 131–163 (1997)
Article MATH Google Scholar
Webb, G.I., Boughton, J.R., Wang, Z.: Not So Naive Bayes: Aggregating One-Dependence Estimators. Machine Learning 58(1), 5–24 (2005)
Article MATH Google Scholar
Jiang, L., Zhang, H., Cai, Z.: A Novel Bayes Model: Hidden Naive Bayes. IEEE Transactions on Knowledge and Data Engineering 21(10), 1361–1371 (2009)
Article Google Scholar
Su, J., Zhang, H.: Full Bayesian Network Classifiers. In: Proc. of the 23rd ICML, pp. 897–904 (2006)
Google Scholar
Schölkopf, B., Platt, J., Shawe-Taylor, J., Smola, A., Williamson, R.: Estimating the Support of a High-Dimensional Distribution. Neural Computation 13(7), 1443–1471 (2001)
Article MATH Google Scholar
Yu, H., Han, J., Chang, K.C.: PEBL: Positive Example Based Learning for Web Page Classification Using SVM. In: Proc. of the 8th SIGKDD, pp. 239–248 (2002)
Google Scholar
Liu, B., Lee, W.S., Yu, P.S., Li, X.: Partially Supervised Classification of Text Documents. In: Proc. of the 9th ICML, pp. 387–394 (2002)
Google Scholar
Li, X., Liu, B.: Learning to Classify Texts Using Positive and Unlabeled Data. In: Proc. of the 18th IJCAI, pp. 587–592 (2003)
Google Scholar
Lee, W.S., Liu, B.: Learning with Positive and Unlabeled Examples Using Weighted Logistic Regression. In: Proc. of the 3rd ICDE, pp. 448–455 (2003)
Google Scholar
Liu, B., Dai, Y., Li, X., Lee, W.S., Yu, P.S.: BuildingText Classifiers Using Positive and Unlabeled Examples. In: Proc. of the 3rd ICDM, pp. 179–186 (2003)
Google Scholar
Denis, F., Gilleron, R., Tommasi, M.: Text Classification from Positive and Unlabeled Examples. In: Proc. of the 9th IPMU, pp. 1927–1934 (2002)
Google Scholar
Denis, F., Gilleron, R., Letouzey, F.: Learning from Positive and Unlabeled Examples. Theoretical Computer Science 38(1), 70–83 (2005)
Article MathSciNet MATH Google Scholar
Zhang, Y., Li, X., Orlowska, M.: One-Class Classification of Text Streams with Concept Drift. In: Proc. of ICDMW, pp. 116–125 (2008)
Google Scholar
Li, X.L., Yu, P.S., Liu, B., Ng, S.K.: Positive Unlabeled Learning for Data Stream Classification. In: Proc. of the 9th SIAM SDM, pp. 257–268 (2009)
Google Scholar
He, J., Zhang, Y., Li, X., Wang, Y.: Naive Bayes Classifier for Positive Unlabeled Learning with Uncertainty. In: Proc. of the 10th SIAM SDM, pp. 361–372 (2010)
Google Scholar
Calvo, B., Larranaga, P., Lozano, J.A.: Learning Bayesian Classifiers from Positive and Unlabeled Examples. Pattern Recognition Letters 28(16), 2375–2384 (2007)
Article Google Scholar
Zadrozny, B., Elkan, C.: Transforming Classifier Scores into Accurate Multiclass Probability Estimates. In: Proc. of the 8th SIGKDD, pp. 694–699 (2002)
Google Scholar
Blake, C.L., Merz, C.J.: UCI repository of machine learning databases, http://www.ics.uci.edu/~mlearn/MLRepository.html
Zhang, H., Jiang, L., Su, J.: Augmenting Naive Bayes for Ranking. In: Proc. of the 22nd ICML, pp. 1020–1027 (2005)
Google Scholar
Zhang, H., Jiang, L., Su, J.: Learning Weighted Naive Bayes with Accurate Ranking. In: Proc. of the 4th ICDM, pp. 567–570 (2004)
Google Scholar
Su, J., Zhang, H.: Learning Conditional Independence Tree for Ranking. In: Proc. of the 4th ICDM, pp. 531–534 (2004)
Google Scholar

Download references

Author information

Authors and Affiliations

College of Information Engineering, Northwest A&F University, China
Jiazhen He & Yang Zhang
State Key Laboratory for Novel Software Technology, Nanjing University, China
Yang Zhang
School of Information Technology and Electrical Engineering, The University of Queensland, Australia
Xue Li
School of Computer, Northwestern Polytechnical University, China
Yong Wang

Authors

Jiazhen He
View author publications
You can also search for this author in PubMed Google Scholar
Yang Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Xue Li
View author publications
You can also search for this author in PubMed Google Scholar
Yong Wang
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Microsoft Research Asia, 5 Danling Rd., Haidian District, 100190, Beijing, China
Haixun Wang
Computer School, Wuhan University, 16 Luojiashan Road, 430072, Hubei, China
Shijun Li
Graduate School of Information Science and Technology, Hokkaido University, Kita 14, Nishi 9, Kita-ku, 060-0814, Hokkaido, Sapporo, Japan
Satoshi Oyama
College of Information Science and Technology, Drexel University, 19104, Philadelphia, PA, USA
Xiaohua Hu
State Key Laboratory of Software Engineering, Wuhan University, 16 Luojiashan Road, 430072, Wuhan, Hubei, China
Tieyun Qian

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

He, J., Zhang, Y., Li, X., Wang, Y. (2011). Bayesian Classifiers for Positive Unlabeled Learning. In: Wang, H., Li, S., Oyama, S., Hu, X., Qian, T. (eds) Web-Age Information Management. WAIM 2011. Lecture Notes in Computer Science, vol 6897. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23535-1_9

Download citation

DOI: https://doi.org/10.1007/978-3-642-23535-1_9
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-23534-4
Online ISBN: 978-3-642-23535-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics