Anovel HEOMGA Approach for Class Imbalance Problem in the Application of Customer Churn Prediction

AlShourbaji, Ibrahim; Helian, Na; Sun, Yi; Alhameed, Mohammed

doi:10.1007/s42979-021-00850-y

Anovel HEOMGA Approach for Class Imbalance Problem in the Application of Customer Churn Prediction

Original Research
Published: 17 September 2021

Volume 2, article number 464, (2021)
Cite this article

SN Computer Science Aims and scope Submit manuscript

Ibrahim AlShourbaji ORCID: orcid.org/0000-0002-6485-8415^1,2,
Na Helian¹,
Yi Sun¹ &
…
Mohammed Alhameed³

548 Accesses
7 Citations
Explore all metrics

Abstract

Making class balance is essential when learning from highly skewed datasets; otherwise, a learner may classify all instances to a negative class, resulting in a high false-negative rate. As a result, a precise balancing strategy is required. Many researchers have investigated class imbalance using Machine Learning (ML) methods due to their powerful generalization performance and interpreting capabilities, comparing with random sampling techniques, to handle the problem of class imbalance in the preprocessing phase to facilitate learning process and improve performance results of learners. In this research, an effective method called HEOMGA is presented by combining Heterogeneous Euclidean-Overlap Metric (HEOM) and Genetic Algorithm (GA) for oversampling minority class. The HEOM is employed to define a fitness function for the GA. To assess the performance of the proposed HEOMGA method, three benchmark datasets from UCI repository in the domain of customer churn prediction are examined using three different ML learners and evaluated with three performance metrics. The experiment results show the effectiveness of the proposed method compared to some popular oversample methods, such as SMOTE, ADASYN, G SMOTE, and Gaussian oversampling methods. The HEOMGA method significantly outperformed the other oversampling methods in terms of recall, G mean, and AUC when the Wilcoxon signed-rank test is used.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Customer churn prediction system: a machine learning approach

Article 14 February 2021

Cost-sensitive learning for imbalanced medical data: a review

Article Open access 01 March 2024

Learning from imbalanced data: open challenges and future directions

Article Open access 22 April 2016

References

Sun Y, Wong AK, Kamel MS. Classification of imbalanced data: a review. Int J Pattern Recogn Artif Intell. 2009;23(04):687–719.
Article Google Scholar
Chen Z, Yan Q, Han H, Wang S, Peng L, Wang L, Yang B. Machine learning based mobile malware detection using highly imbalanced network traffic. Inf Sci. 2018;433:346–64.
Article Google Scholar
Jain A, Ratnoo S, Kumar D (2017) Addressing class imbalance problem in medical diagnosis: a genetic algorithm approach. In: 2017 international conference on information, communication, instrumentation and control (ICICIC) (pp. 1–8), IEEE
Ramli NA, Ismail MT, Wooi HC. Measuring the accuracy of currency crisis prediction with combined classifiers in designing early warning system. Mach Learn. 2015;101(1–3):85–103.
Article MathSciNet Google Scholar
Dwiyanti E, Ardiyanti A (2016) Handling imbalanced data in churn prediction using rusboost and feature selection (case study: Pt. telekomunikasiindonesia regional 7). In: International conference on soft computing and data mining (pp 376–385). Springer, Cham
He B, Shi Y, Wan Q, Zhao X. Prediction of customer attrition of commercial banks based on SVM model. Procedia Comput Sci. 2014;31:423–30.
Article Google Scholar
Huang PJ (2015) Classication of imbalanced data using synthetic over-sampling techniques, Doctoral dissertation, University of California
Chawla NV (2009) Data mining for imbalanced datasets: an overview. In: Data mining and knowledge discovery handbook (pp 875–886). Springer, Boston
Burez J, Van den Poel D. Handling class imbalance in customer churn prediction. Expert Syst Appl. 2009;36(3):4626–36.
Article Google Scholar
Amin A, Al-Obeidat F, Shah B, Adnan A, Loo J, Anwar S. Customer churn prediction in telecommunication industry using data certainty. J Bus Res. 2019;94:290–301.
Article Google Scholar
Chawla NV, Japkowicz N, Kotcz A. Special issue on learning from imbalanced data sets. ACM SIGKDD Explor Newsl. 2004;6(1):1–6.
Article Google Scholar
Liu XY, Wu J, Zhou ZH (2009) Exploratory undersampling for class-imbalance learning. IEEE Trans Syst Man Cybern Part B Cybern 39(2):539–550
Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP. SMOTE: synthetic minority over-sampling technique. J Artif Intell Res. 2002;16:321–57.
Article Google Scholar
He H, Bai Y, Garcia EA, Li S (2008) ADASYN: adaptive synthetic sampling approach for imbalanced learning. In: Neural networks, 2008. IJCNN 2008 (IEEE World Congress on Computational Intelligence). IEEE International Joint Conference on (pp 1322–1328), IEEE
Han H, Wang WY, Mao BH (2005) Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning. In: International conference on intelligent computing (pp 878–887). Springer, Berlin, Heidelberg
Bunkhumpornpat C, Sinapiromsaran K, Lursinsap C. Safe-level-smote: safe-level-synthetic minority over-sampling technique for handling the class imbalanced problem. Adv Knowl Discov Data Min. 2009;2009:475–82.
Article Google Scholar
Maciejewski T, Stefanowski J (2011) Local neighbourhood extension of SMOTE for mining imbalanced data. In: Computational intelligence and data mining (CIDM), 2011 IEEE symposium on (pp 104–111), IEEE
Barua S, Islam MM, Yao X, Murase K. MWMOTE–majority weighted minority oversampling technique for imbalanced data set learning. IEEE Trans Knowl Data Eng. 2014;26(2):405–25.
Article Google Scholar
Zhu B, Broucke S, Baesens B, Maldonado S (2017) improving resampling-based ensemble in churn prediction. In: First international workshop on learning with imbalanced domains: theory and applications, pp 79–91
Amin A, Anwar S, Adnan A, Nawaz M, Howard N, Qadir J, Hussain A, et al. Comparing oversampling techniques to handle the class imbalance problem: A customer churn prediction case study. IEEE Access. 2016;4:7940–57.
Article Google Scholar
Salunkhe UR, Mali SN. A hybrid approach for class imbalance problem in customer churn prediction: a novel extension to undersampling. Int J Intell Syst Appl. 2018;10(5):71.
Google Scholar
Zou S, Huang Y, Wang Y, Wang J, Zhou C (2008) SVM learning from imbalanced data by GA sampling for protein domain prediction. In: 2008 the 9th international conference for young computer scientists (pp 982–987), IEEE
Haque MN, Noman N, Berretta R, Moscato P. Heterogeneous ensemble combination search using genetic algorithm for class imbalanced data classification. PLoS ONE. 2016;11:1.
Google Scholar
Cervantes J, Li X, Yu W (2013) Using genetic algorithm to improve classification accuracy on imbalanced data. In: 2013 IEEE international conference on systems, man, and cybernetics (pp 2659–2664), IEEE
Jiang K, Lu J, Xia K. A novel algorithm for imbalance data classification based on genetic algorithm improved SMOTE. Arab J Sci Eng. 2016;41(8):3255–66.
Article Google Scholar
Karia V, Zhang W, Naeim A, Ramezani R (2019) GenSample: a genetic algorithm for oversampling in imbalanced datasets. arXiv: 1910.10806
Mahin M, Islam MJ, Khatun A, Debnath BC (2018) A comparative study of distance metric learning to find sub-categories of minority class from imbalance data. In: 2018 international conference on innovation in engineering and technology (ICIET) (pp 1–6), IEEE
El Hindi K. Specific-class distance measures for nominal attributes. AI Commun. 2013;26(3):261–79.
Article MathSciNet Google Scholar
Li C, Li H. A survey of distance metrics for nominal attributes. J Softw. 2010;5(11):1262–9.
Google Scholar
Wilson DR, Martinez TR. Improved heterogeneous distance functions. J Artif Intell Res. 1997;6:1–34.
Article MathSciNet Google Scholar
Mahin M, Islam MJ, Debnath BC, Khatun A (2019) Tuning distance metrics and K to find sub-categories of minority class from imbalance data using K nearest neighbours. In: 2019 international conference on electrical, computer and communication engineering (ECCE) (pp 1–6), IEEE
Guo H, Viktor HL. Learning from imbalanced data sets with boosting and data generation: the databoost-im approach. ACM SIGKDD Explor Newsl. 2004;6(1):30–9.
Article Google Scholar
Liu Y, Yu X, Huang JX, An A. Combining integrated sampling with SVM ensembles for learning from imbalanced datasets. Inf Process Manage. 2011;47(4):617–31.
Article Google Scholar
Santos MS, Abreu PH, García-Laencina PJ, Simão A, Carvalho A. A new cluster-based oversampling method for improving survival prediction of hepatocellular carcinoma patients. J Biomed Inform. 2015;58:49–59.
Article Google Scholar
Kagie M, van Wezel M, Groenen PJ (2009) An empirical comparison of dissimilarity measures for recommender systems
Tsymbal A, Pechenizkiy M, Cunningham P (2006) Dynamic integration with random forests. In: European conference on machine learning, pp 801–808. Springer, Berlin, Heidelberg
El-Sappagh S, Elmogy M, Ali F, Abuhmed T, Islam SM, Kwak KS. A comprehensive medical decision-support framework based on a heterogeneous ensemble classifier for diabetes prediction. Electronics. 2019;8(6):635.
Article Google Scholar
Vandecruys O, Martens D, Baesens B, Mues C, De Backer M, Haesen R. Mining software repositories for comprehensible software fault prediction models. J Syst Softw. 2008;81(5):823–39.
Article Google Scholar
Rokach L, Maimon OZ (2008) Data mining with decision trees: theory and applications (vol 69). World scientific
Das B, Krishnan NC, Cook DJ (2013) Handling class overlap and imbalance to detect prompt situations in smart homes. In: 2013 IEEE 13th international conference on data mining workshops, pp 266–273, IEEE
He H, Garcia EA. Learning from imbalanced data. IEEE Trans Knowl Data Eng. 2008;9:1263–84.
Google Scholar
Douzas G, Bacao F. Geometric SMOTE a geometrically enhanced drop-in replacement for SMOTE. Inf Sci. 2019;501:118–35.
Article Google Scholar
Zhang H, Wang Z (2011) A normal distribution-based over-sampling approach to imbalanced data classification. In: International conference on advanced data mining and applications, pp 83–96. Springer, Berlin, Heidelberg
García S, Molina D, Lozano M, Herrera F. A study on the use of non-parametric tests for analyzing the evolutionary algorithms’ behaviour: a case study on the CEC’2005 special session on real parameter optimization. J Heuristics. 2009;15(6):617.
Article Google Scholar

Download references

Funding

No funding sources.

Author information

Authors and Affiliations

School of Computer Science, University of Hertfordshire, Hatfield, UK
Ibrahim AlShourbaji, Na Helian & Yi Sun
Department of Computer and Network Engineering, Jazan University, Jazan, Saudi Arabia
Ibrahim AlShourbaji
Department of Computer Science, Jazan University, Jazan, Saudi Arabia
Mohammed Alhameed

Authors

Ibrahim AlShourbaji
View author publications
You can also search for this author in PubMed Google Scholar
Na Helian
View author publications
You can also search for this author in PubMed Google Scholar
Yi Sun
View author publications
You can also search for this author in PubMed Google Scholar
Mohammed Alhameed
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ibrahim AlShourbaji.

Ethics declarations

Conflict of interest

Authors have declared that no conflict of interest exists.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

AlShourbaji, I., Helian, N., Sun, Y. et al. Anovel HEOMGA Approach for Class Imbalance Problem in the Application of Customer Churn Prediction. SN COMPUT. SCI. 2, 464 (2021). https://doi.org/10.1007/s42979-021-00850-y

Download citation

Received: 15 January 2021
Accepted: 24 August 2021
Published: 17 September 2021
DOI: https://doi.org/10.1007/s42979-021-00850-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Anovel HEOMGA Approach for Class Imbalance Problem in the Application of Customer Churn Prediction

Abstract

Access this article

Similar content being viewed by others

Customer churn prediction system: a machine learning approach

Cost-sensitive learning for imbalanced medical data: a review

Learning from imbalanced data: open challenges and future directions

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Anovel HEOMGA Approach for Class Imbalance Problem in the Application of Customer Churn Prediction

Abstract

Access this article

Similar content being viewed by others

Customer churn prediction system: a machine learning approach

Cost-sensitive learning for imbalanced medical data: a review

Learning from imbalanced data: open challenges and future directions

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation