Some Proposal of the High Dimensional PU Learning Classification Procedure

Furmańczyk, Konrad; Dudziński, Marcin; Dziewa-Dawidczyk, Diana

doi:10.1007/978-3-030-77967-2_2

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12744))

Included in the following conference series:

International Conference on Computational Science

1064 Accesses
1 Citations

Abstract

In our work, we propose a new classification method for positive and unlabeled (PU) data, called the LassoJoint classification procedure, which combines the thresholded Lasso approach in the first two steps with the joint method based on logistic regression, introduced by Teisseyre et al. [12], in the last step. We prove that, under some regularity conditions, our procedure satisfies the screening property. We also conduct some simulation study in order to compare the proposed classification procedure with the oracle method. Prediction accuracy of the proposed method has been verified for some selected real datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://github.com/kfurmanczyk/ICCS21.

References

Bekker, J., Davis, J.: Learning from positive and unlabeled data: a survey. Mach. Learn. 109(4), 719–760 (2020). https://doi.org/10.1007/s10994-020-05877-5
Article MathSciNet MATH Google Scholar
Friedman, J., Hastie, T., Simon, N., Tibshirani, R.: Glmnet: Lasso and elastic-net regularized generalized linear models. R package version 2.0 (2015)
Google Scholar
Furmańczyk, K., Rejchel, W.: High-dimensional linear model selection motivated by multiple testing. Statistics 54, 152–166 (2020)
Article MathSciNet Google Scholar
Furmańczyk, K., Rejchel, W.: Prediction and variable selection in high-dimensional misspecified classification. Entropy 22(5), 543 (2020)
Article MathSciNet Google Scholar
Guo, T., et al.: On positive-unlabeled classification in GAN. In: CVPR (2020)
Google Scholar
Hastie, T., Fithian, W.: Inference from presence-only data; the ongoing controversy. Ecography 36, 864–867 (2013)
Article Google Scholar
Hou, M., Chaib-draa, B., Li, C., Zhao, Q.: Generative adversarial positive-unlabeled learning. In: Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence (IJCAI 2018) (2018)
Google Scholar
Kubkowski, M.: Misspecification of binary regression model: properties and inferential procedures. Ph.D. thesis, Warsaw University of Technology, Warsaw (2019)
Google Scholar
Kubkowski, M., Mielniczuk, J.: Active set of predictors for misspecified logistic regression. Statistics 51, 1023–1045 (2017)
Article MathSciNet Google Scholar
Mordelet, F., Vert, J.P.: A bagging SVM to learn from positive and unlabeled examples. Pattern Recogn. Lett. 37, 201–209 (2013)
Article Google Scholar
Song, H., Raskutti, G.: High-dimensional variable selection with presence-only data. arXiv:1711.08129v3 (2018)
Teisseyre, P., Mielniczuk, J., Łazęcka, M.: Different strategies of fitting logistic regression for positive and unlabelled data. In: Krzhizhanovskaya, V.V., et al. (eds.) ICCS 2020. LNCS, vol. 12140, pp. 3–17. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-50423-6_1
Chapter Google Scholar
Teisseyre, P.: Repository from https://github.com/teisseyrep/Pulogistic. Accessed 1 Jan 2021
Tibshirani, R.: Regression shrinkage and selection via the lasso. J. Roy. Statist. Soc. Ser. B 58, 267–288 (1996)
MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Information Technology, Warsaw University of Life Sciences, Warsaw, Poland
Konrad Furmańczyk, Marcin Dudziński & Diana Dziewa-Dawidczyk

Authors

Konrad Furmańczyk
View author publications
You can also search for this author in PubMed Google Scholar
Marcin Dudziński
View author publications
You can also search for this author in PubMed Google Scholar
Diana Dziewa-Dawidczyk
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Konrad Furmańczyk .

Editor information

Editors and Affiliations

AGH University of Science and Technology, Krakow, Poland
Maciej Paszynski
Ludwig-Maximilians-Universität München, Munich, Germany
Dieter Kranzlmüller
University of Amsterdam, Amsterdam, The Netherlands
Valeria V. Krzhizhanovskaya
University of Tennessee at Knoxville, Knoxville, TN, USA
Jack J. Dongarra
University of Amsterdam, Amsterdam, The Netherlands
Peter M.A. Sloot

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Furmańczyk, K., Dudziński, M., Dziewa-Dawidczyk, D. (2021). Some Proposal of the High Dimensional PU Learning Classification Procedure. In: Paszynski, M., Kranzlmüller, D., Krzhizhanovskaya, V.V., Dongarra, J.J., Sloot, P.M. (eds) Computational Science – ICCS 2021. ICCS 2021. Lecture Notes in Computer Science(), vol 12744. Springer, Cham. https://doi.org/10.1007/978-3-030-77967-2_2

Download citation

DOI: https://doi.org/10.1007/978-3-030-77967-2_2
Published: 09 June 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-77966-5
Online ISBN: 978-3-030-77967-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics