Advertisement

Principled analytic classifier for positive-unlabeled learning via weighted integral probability metric

  • Yongchan Kwon
  • Wonyoung Kim
  • Masashi Sugiyama
  • Myunghee Cho PaikEmail author
Article
  • 34 Downloads
Part of the following topical collections:
  1. Special Issue of the ACML 2019 Journal Track

Abstract

We consider the problem of learning a binary classifier from only positive and unlabeled observations (called PU learning). Recent studies in PU learning have shown superior performance theoretically and empirically. However, most existing algorithms may not be suitable for large-scale datasets because they face repeated computations of a large Gram matrix or require massive hyperparameter optimization. In this paper, we propose a computationally efficient and theoretically grounded PU learning algorithm. The proposed PU learning algorithm produces a closed-form classifier when the hypothesis space is a closed ball in reproducing kernel Hilbert space. In addition, we establish upper bounds of the estimation error and the excess risk. The obtained estimation error bound is sharper than existing results and the derived excess risk bound has an explicit form, which vanishes as sample sizes increase. Finally, we conduct extensive numerical experiments using both synthetic and real datasets, demonstrating improved accuracy, scalability, and robustness of the proposed algorithm.

Keywords

Positive and unlabeled learning Integral probability metric Excess risk bound Approximation error Reproducing kernel Hilbert space 

Notes

Acknowledgements

YK, WK, and MCP were supported by the National Research Foundation of Korea under Grant NRF-2017R1A2B4008956. MS was supported by JST CREST JPMJCR1403.

References

  1. Arjovsky, M., Chintala, S., & Bottou, L. (2017). Wasserstein generative adversarial networks. In International conference on machine learning, pp. 214–223.Google Scholar
  2. Bartlett, P. L., Jordan, M. I., & McAuliffe, J. D. (2006). Convexity, classification, and risk bounds. Journal of the American Statistical Association, 101(473), 138–156.MathSciNetCrossRefGoogle Scholar
  3. Bartlett, P. L., & Mendelson, S. (2002). Rademacher and gaussian complexities: Risk bounds and structural results. Journal of Machine Learning Research, 3(Nov), 463–482.MathSciNetzbMATHGoogle Scholar
  4. Bekker, J., & Davis, J. (2018). Estimating the class prior in positive and unlabeled data through decision tree induction. In Proceedings of the 32th AAAI conference on artificial intelligence.Google Scholar
  5. Blanchard, G., Flaska, M., Handy, G., Pozzi, S., & Scott, C. (2016). Classification with asymmetric label noise: Consistency and maximal denoising. Electronic Journal of Statistics, 10(2), 2780–2824.MathSciNetCrossRefGoogle Scholar
  6. Blanchard, G., Lee, G., & Scott, C. (2010). Semi-supervised novelty detection. Journal of Machine Learning Research, 11(Nov), 2973–3009.MathSciNetzbMATHGoogle Scholar
  7. Chang, C.-C., & Lin, C.-J. (2011). Libsvm: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology (TIST), 2(3), 27.Google Scholar
  8. Chapelle, O., Schölkopf, B., & Zien, A. (2006). Semi-supervised Learning. Cambridge: MIT Press.CrossRefGoogle Scholar
  9. Collobert, R., Sinz, F., Weston, J., & Bottou, L. (2006). Trading convexity for scalability. In Proceedings of the 23rd international conference on machine learning, ACM, pp. 201–208.Google Scholar
  10. Denis, F., Gilleron, R., & Letouzey, F. (2005). Learning from positive and unlabeled examples. Theoretical Computer Science, 348(1), 70–83.MathSciNetCrossRefGoogle Scholar
  11. Du Plessis, M. C., Niu, G., & Sugiyama, M. (2014). Analysis of learning from positive and unlabeled data. In Advances in neural information processing systems, pp. 703–711.Google Scholar
  12. Du Plessis, M. C., Niu, G., & Sugiyama, M. (2015). Convex formulation for learning from positive and unlabeled data. In International conference on machine learning, pp. 1386–1394.Google Scholar
  13. Elkan, C., & Noto, K. (2008). Learning classifiers from only positive and unlabeled data. In Proceedings of the 14th ACM SIGKDD international conference on knowledge discovery and data mining, ACM, pp. 213–220.Google Scholar
  14. Gong, T., Wang, G., Ye, J., Xu, Z., & Lin, M. (2018). Margin based pu learning. In AAAI conference on artificial intelligence.Google Scholar
  15. Gretton, A., Borgwardt, K. M., Rasch, M. J., Schölkopf, B., & Smola, A. (2012). A kernel two-sample test. Journal of Machine Learning Research, 13(Mar), 723–773.MathSciNetzbMATHGoogle Scholar
  16. Gretton, A., Smola, A. J., Huang, J., Schmittfull, M., Borgwardt, K. M., Schölkopf, B., Candela, Q., Sugiyama, M., Schwaighofer, A., Lawrence, N. D. et al. (2009). Covariate shift by kernel mean matching. In Dataset shift in machine learning, MIT Press, pp. 131–160.Google Scholar
  17. Huang, J., Gretton, A., Borgwardt, K. M., Schölkopf, B., & Smola, A. J. (2007). Correcting sample selection bias by unlabeled data. In Advances in neural information processing systems, pp. 601–608.Google Scholar
  18. Kato, M., Teshima, T., & Honda, J. (2019). Learning from positive and unlabeled data with a selection bias. In International conference on learning representations. URL https://openreview.net/forum?id=rJzLciCqKm.
  19. Kiryo, R., Niu, G., Plessis, Du Marthinus C., & Sugiyama, M. (2017). Positive-unlabeled learning with non-negative risk estimator. In Advances in neural information processing systems, pp. 1675–1685.Google Scholar
  20. Li, X., & Liu, B. (2003). Learning to classify texts using positive and unlabeled data. In Proceedings of the 18th international joint conference on artificial intelligence, Morgan Kaufmann Publishers Inc., pp. 587–592.Google Scholar
  21. Li, X.-L., & Liu, B. (2005). Learning from positive and unlabeled examples with different data distributions. In European conference on machine learning, pp. 218–229.Google Scholar
  22. Lin, Y. (2002). Support vector machines and the bayes rule in classification. Data Mining and Knowledge Discovery, 6(3), 259–275.MathSciNetCrossRefGoogle Scholar
  23. Liu, B., Dai, Y., Li, X., Lee, W. S., & Yu, P. S. (2003). Building text classifiers using positive and unlabeled examples. In Third IEEE international conference on data mining, 2003, ICDM 2003, IEEE, pp. 179–186.Google Scholar
  24. Liu, B., Lee, W. S., Yu, P. S., & Li, X. (2002). Partially supervised classification of text documents. In International conference on machine learning, Vol. 2, Citeseer, pp. 387–394.Google Scholar
  25. Müller, A. (1997). Integral probability metrics and their generating classes of functions. Advances in Applied Probability, 29(2), 429–443.MathSciNetCrossRefGoogle Scholar
  26. Natarajan, N., Dhillon, I. S., Ravikumar, P. K., & Tewari, A. (2013). Learning with noisy labels. In Advances in neural information processing systems, pp. 1196–1204.Google Scholar
  27. Niu, G., Du Plessis, M. C., Sakai, T., Ma, Y., & Sugiyama, M. (2016). Theoretical comparisons of positive-unlabeled learning against positive-negative learning. In Advances in neural information processing systems, pp. 1199–1207.Google Scholar
  28. Oh, C. Y., Gavves, E., & Welling, M. (2018). Bock: Bayesian optimization with cylindrical kernels. arXiv preprintarXiv:1806.01619.Google Scholar
  29. Patrini, G., Nielsen, F., Nock, R., & Carioni, M. (2016). Loss factorization, weakly supervised learning and label noise robustness. In International conference on machine learning, pp. 708–717.Google Scholar
  30. Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., et al. (2011). Scikit-learn: Machine learning in python. Journal of Machine Learning Research, 12, 2825–2830.MathSciNetzbMATHGoogle Scholar
  31. Ramaswamy, H., Scott, C., & Tewari, A. (2016). Mixture proportion estimation via kernel embeddings of distributions. In International conference on machine learning, pp. 2052–2060.Google Scholar
  32. Sakai, T., Du Plessis, M. C., Niu, G., & Sugiyama, M. (2017). Semi-supervised classification based on classification from positive and unlabeled data. In International conference on machine learning, pp. 2998–3006.Google Scholar
  33. Sansone, E., De Natale, F. G. B., & Zhou, Z. (2019). Efficient training for positive unlabeled learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 41(11), 2584–2598.CrossRefGoogle Scholar
  34. Scott, C., & Blanchard, G. (2009). Novelty detection: Unlabeled data definitely help. In Artificial intelligence and statistics, pp. 464–471.Google Scholar
  35. Smale, S., & Zhou, D.-X. (2003). Estimating the approximation error in learning theory. Analysis and Applications, 1(01), 17–41.MathSciNetCrossRefGoogle Scholar
  36. Sriperumbudur, B. K., Fukumizu, K., Gretton, A., Schölkopf, B., & Lanckriet, G. R. G. (2012). On the empirical estimation of integral probability metrics. Electronic Journal of Statistics, 6, 1550–1599.MathSciNetCrossRefGoogle Scholar
  37. Sriperumbudur, B. K., Fukumizu, K., & Lanckriet, G. (2010a). On the relation between universality, characteristic kernels and rkhs embedding of measures. In Proceedings of the thirteenth international conference on artificial intelligence and statistics, pp. 773–780.Google Scholar
  38. Sriperumbudur, B. K., Gretton, A., Fukumizu, K., Schölkopf, B., & Lanckriet, G. R. G. (2010b). Hilbert space embeddings and metrics on probability measures. Journal of Machine Learning Research, 11(Apr), 1517–1561.MathSciNetzbMATHGoogle Scholar
  39. Steinwart, I., & Christmann, A. (2008). Support vector machines. Berlin: Springer.zbMATHGoogle Scholar
  40. Tolstikhin, I., Bousquet, O., Gelly, S., & Schoelkopf, B. (2018). Wasserstein auto-encoders. In International conference on learning representations.Google Scholar
  41. Ward, G., Hastie, T., Barry, S., Elith, J., & Leathwick, J. R. (2009). Presence-only data and the EM algorithm. Biometrics, 65(2), 554–563.MathSciNetCrossRefGoogle Scholar
  42. Xiao, Y., Liu, B., Yin, J., Cao, L., Zhang, C., & Hao, Z. (2011). Similarity-based approach for positive and unlabelled learning. In Twenty-second international joint conference on artificial intelligence.Google Scholar
  43. Yan, H., Ding, Y., Li, P., Wang, Q., Xu, Y., & Zuo, W. (2017). Mind the class weight bias: Weighted maximum mean discrepancy for unsupervised domain adaptation. In 2017 IEEE conference on computer vision and pattern recognition (CVPR), IEEE, pp. 945–954.Google Scholar
  44. Yang, P., Li, X., Chua, H.-N., Kwoh, C.-K., & Ng, S.-K. (2014). Ensemble positive unlabeled learning for disease gene identification. PLoS ONE, 9(5), e97079.CrossRefGoogle Scholar
  45. Yang, P., Li, X.-L., Mei, J.-P., Kwoh, C.-K., & Ng, S.-K. (2012). Positive-unlabeled learning for disease gene identification. Bioinformatics, 28(20), 2640–2647.CrossRefGoogle Scholar
  46. Zhang, J., Wang, Z., Yuan, J., Tan, Y.-P. (2017). Positive and unlabeled learning for anomaly detection with multi-features. In Proceedings of the 2017 ACM on multimedia conference, ACM, pp. 854–862.Google Scholar
  47. Zhang, T. (2004). Statistical behavior and consistency of classification methods based on convex risk minimization. Annals of Statistics, 32, 56–85.MathSciNetCrossRefGoogle Scholar
  48. Zuluaga, M. A., Hush, D., Delgado, E., Leyton, J. F., Hoyos, M. H., & Orkisz, M. (2011). Learning from only positive and unlabeled data to detect lesions in vascular ct images. In International conference on medical image computing and computer-assisted intervention, Springer, pp. 9–16.Google Scholar

Copyright information

© The Author(s), under exclusive licence to Springer Science+Business Media LLC, part of Springer Nature 2019

Authors and Affiliations

  • Yongchan Kwon
    • 1
  • Wonyoung Kim
    • 1
  • Masashi Sugiyama
    • 2
    • 3
  • Myunghee Cho Paik
    • 1
    Email author
  1. 1.Department of StatisticsSeoul National UniversitySeoulSouth Korea
  2. 2.Center for Advanced Intelligence ProjectRIKENTokyoJapan
  3. 3.Department of Complexity Science and Engineering, Graduate School of Frontier SciencesThe University of TokyoTokyoJapan

Personalised recommendations