USMOTE: A Synthetic Data-Set-Based Method Improving Imbalanced Learning

Wang, Junyi

doi:10.1007/978-3-031-31775-0_57

Junyi Wang⁷

Part of the book series: Lecture Notes on Data Engineering and Communications Technologies ((LNDECT,volume 173))

Included in the following conference series:

The International Conference on Cyber Security Intelligence and Analytics

311 Accesses

Abstract

Imbalanced learning plays an important role in our daily life, featuring large amounts of normal samples and small percentage of abnormal ones in its data set. To solve these imbalanced data cases, machine learning models like Decision Tree and Logistic Regression have been widely applied. However, performance of models is always negatively affected due to the massive imbalance. In order to fix this problem, sampling methods are used to balance the data sets. This work combines random undersampling with SMOTE (Synthetic Minority Over-sampling Technique) to synthetically modify data sets and train models, which achieves better recall_score performance in experiments. Additionally, we correct the mistake that other works about sampling methods always evaluate models on the transformed data set, which is against its original purpose. At last, we improve the Logistic Regression algorithm using this data-set-based technique, allowing it to perform better when handling imbalanced data cases.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 189.00; Price excludes VAT (USA)

Softcover Book: USD 249.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Hybrid of Intelligent Minority Oversampling and PSO-Based Intelligent Majority Undersampling for Learning from Imbalanced Datasets

Sampling Approaches for Imbalanced Data Classification Problem in Machine Learning

Noise-free sampling with majority framework for an imbalanced classification problem

Article 09 April 2024

References

Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: snopes.com: two-striped telamonia spider. J. Artif. Intell. Res. 16(Sept. 28), 321–357 (2002). https://arxiv.org/pdf/1106.1813.pdf, http://www.snopes.com/horrors/insects/telamonia.asp
Dal Pozzolo, A., Caelen, O., Bontempi, G., Johnson, R.A.: Calibrating Probability with Undersampling for Unbalanced Classification Fraud detection View project Volatility forecasting View project Calibrating Probability with Undersampling for Unbalanced Classification (2015). https://www.researchgate.net/publication/283349138
DeRouin, E., Brown, J., Fausett, L., Schneider, M.: Neural network training on unequally represented classes. In: Intelligent Engineering Systems Through Artificial Neural Networks, pp. 135–141. ASME Press, New York (1991). https://dl.acm.org/doi/book/10.5555/1557404
Dev, S., Wang, H., Nwosu, C.S., Jain, N., Veeravalli, B., John, D.: A predictive analytics approach for stroke prediction using machine learning and neural networks. Healthc. Anal. 2, 100032 (2022). https://doi.org/10.1016/j.health.2022.100032
Article Google Scholar
Domingos, P.: MetaCost: a general method for making classifiers cost-sensitive. In: Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 155–164. ACM Press, San Diego, CA (1999). https://dl.acm.org/doi/pdf/10.1145/312129.312220
Kubat, M., Matwin, S.: Addressing the curse of imbalanced training sets: one sided selection. In: Proceedings of the Fourteenth International Conference on Machine Learning, pp. 179–186. Morgan Kaufmann, Nashville, Tennesse (1997). https://dblp.org/rec/conf/icml/KubatM97.html
Kulatilleke, G.K.: Challenges and complexities in machine learning based credit card fraud detection, pp. 1–17 (2022a). http://arxiv.org/abs/2208.10943
Kulatilleke, G.K.: Credit card fraud detection - classifier selection strategy, pp. 1–17 (2022b). http://arxiv.org/abs/2208.11900
Kulatilleke, G.K., Samarakoon, S.: Empirical study of machine learning classifier evaluation metrics behavior in massively imbalanced and noisy data (2022). http://arxiv.org/abs/2208.11904
Ling, C., Li, C.: Data mining for direct marketing problems and solutions. In: Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining (KDD-1998). AAAI Press, New York, NY (1998). https://www.csd.uwo.ca/~xling/papers/kdd98
Rosadi, D., et al.: Improving machine learning prediction of peatlands fire occurrence for unbalanced data using SMOTE approach. In: 2021 International Conference on Data Science, Artificial Intelligence, and Business Analytics, DATABIA 2021 - Proceedings, pp. 160–163 (2021). https://doi.org/10.1109/DATABIA53375.2021.9650084
Sohony, I., Pratap, R., Nambiar, U.: Ensemble learning for credit card fraud detection. In: Proceedings of the ACM India Joint International Conference on Data Science and Management of Data, pp. 289–294 (2018). https://dl.acm.org/doi/abs/10.1145/3152494.3156815
Tarawneh, A.S., Hassanat, A.B., Altarawneh, G.A., Almuhaimeed, A.: Stop oversampling for class imbalance learning: a review. IEEE Access 10, 47643–47660 (2022). https://doi.org/10.1109/ACCESS.2022.3169512
Article Google Scholar
Yousuf, B.B., Sulaiman, R.B., Nipun, M.S.: Chapter * A novel approach to increase scalability while training machine learning algorithms using Bfloat – 16 in credit card fraud detection (n.d.)
Google Scholar

Download references

Author information

Authors and Affiliations

Xi’an Jiaotong University, Xi’an, 710049, China
Junyi Wang

Authors

Junyi Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Junyi Wang .

Editor information

Editors and Affiliations

Shanghai Polytechnic University, Shanghai, China
Zheng Xu
United Arab Emirates University, Abu Dhabi, United Arab Emirates
Saed Alrabaee
Stratesys, Madrid, Spain
Octavio Loyola-González
Telkom University, Cileunyi, Jawa Barat, Indonesia
Niken Dwi Wahyu Cahyani
Universiti Tun Hussein Onn Malaysia, Parit Raja, Malaysia
Nurul Hidayah Ab Rahman

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, J. (2023). USMOTE: A Synthetic Data-Set-Based Method Improving Imbalanced Learning. In: Xu, Z., Alrabaee, S., Loyola-González, O., Cahyani, N.D.W., Ab Rahman, N.H. (eds) Cyber Security Intelligence and Analytics. CSIA 2023. Lecture Notes on Data Engineering and Communications Technologies, vol 173. Springer, Cham. https://doi.org/10.1007/978-3-031-31775-0_57

Download citation

DOI: https://doi.org/10.1007/978-3-031-31775-0_57
Published: 30 April 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-31774-3
Online ISBN: 978-3-031-31775-0
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics

USMOTE: A Synthetic Data-Set-Based Method Improving Imbalanced Learning

Abstract

Access this chapter

Similar content being viewed by others

Hybrid of Intelligent Minority Oversampling and PSO-Based Intelligent Majority Undersampling for Learning from Imbalanced Datasets

Sampling Approaches for Imbalanced Data Classification Problem in Machine Learning

Noise-free sampling with majority framework for an imbalanced classification problem

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

USMOTE: A Synthetic Data-Set-Based Method Improving Imbalanced Learning

Abstract

Access this chapter

Similar content being viewed by others

Hybrid of Intelligent Minority Oversampling and PSO-Based Intelligent Majority Undersampling for Learning from Imbalanced Datasets

Sampling Approaches for Imbalanced Data Classification Problem in Machine Learning

Noise-free sampling with majority framework for an imbalanced classification problem

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation