Abstract
In our work, we propose a new classification method for positive and unlabeled (PU) data, called the LassoJoint classification procedure, which combines the thresholded Lasso approach in the first two steps with the joint method based on logistic regression, introduced by Teisseyre et al. [12], in the last step. We prove that, under some regularity conditions, our procedure satisfies the screening property. We also conduct some simulation study in order to compare the proposed classification procedure with the oracle method. Prediction accuracy of the proposed method has been verified for some selected real datasets.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Bekker, J., Davis, J.: Learning from positive and unlabeled data: a survey. Mach. Learn. 109(4), 719–760 (2020). https://doi.org/10.1007/s10994-020-05877-5
Friedman, J., Hastie, T., Simon, N., Tibshirani, R.: Glmnet: Lasso and elastic-net regularized generalized linear models. R package version 2.0 (2015)
Furmańczyk, K., Rejchel, W.: High-dimensional linear model selection motivated by multiple testing. Statistics 54, 152–166 (2020)
Furmańczyk, K., Rejchel, W.: Prediction and variable selection in high-dimensional misspecified classification. Entropy 22(5), 543 (2020)
Guo, T., et al.: On positive-unlabeled classification in GAN. In: CVPR (2020)
Hastie, T., Fithian, W.: Inference from presence-only data; the ongoing controversy. Ecography 36, 864–867 (2013)
Hou, M., Chaib-draa, B., Li, C., Zhao, Q.: Generative adversarial positive-unlabeled learning. In: Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence (IJCAI 2018) (2018)
Kubkowski, M.: Misspecification of binary regression model: properties and inferential procedures. Ph.D. thesis, Warsaw University of Technology, Warsaw (2019)
Kubkowski, M., Mielniczuk, J.: Active set of predictors for misspecified logistic regression. Statistics 51, 1023–1045 (2017)
Mordelet, F., Vert, J.P.: A bagging SVM to learn from positive and unlabeled examples. Pattern Recogn. Lett. 37, 201–209 (2013)
Song, H., Raskutti, G.: High-dimensional variable selection with presence-only data. arXiv:1711.08129v3 (2018)
Teisseyre, P., Mielniczuk, J., Łazęcka, M.: Different strategies of fitting logistic regression for positive and unlabelled data. In: Krzhizhanovskaya, V.V., et al. (eds.) ICCS 2020. LNCS, vol. 12140, pp. 3–17. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-50423-6_1
Teisseyre, P.: Repository from https://github.com/teisseyrep/Pulogistic. Accessed 1 Jan 2021
Tibshirani, R.: Regression shrinkage and selection via the lasso. J. Roy. Statist. Soc. Ser. B 58, 267–288 (1996)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Furmańczyk, K., Dudziński, M., Dziewa-Dawidczyk, D. (2021). Some Proposal of the High Dimensional PU Learning Classification Procedure. In: Paszynski, M., Kranzlmüller, D., Krzhizhanovskaya, V.V., Dongarra, J.J., Sloot, P.M. (eds) Computational Science – ICCS 2021. ICCS 2021. Lecture Notes in Computer Science(), vol 12744. Springer, Cham. https://doi.org/10.1007/978-3-030-77967-2_2
Download citation
DOI: https://doi.org/10.1007/978-3-030-77967-2_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-77966-5
Online ISBN: 978-3-030-77967-2
eBook Packages: Computer ScienceComputer Science (R0)