Abstract
Imbalanced data classification is a challenge in data mining and machine learning. To improve the classification performance for imbalanced data, this paper proposes an imbalanced data classification algorithm based on the optimized Mahalanobis-Taguchi system (OMTS). At the feature selection stage, important feature variables are determined by four principles, namely maximizing mutual information between features and classes, minimizing mutual information between features, maximizing the initial classification accuracy, and selecting features that produce not only the local maximum or minimum of the difference between the mean Mahalanobis distances (MDs) of normal and abnormal samples but also the largest number of features. At the threshold determination stage, using the selected features, particle swarm optimization is used to determine the optimal threshold for classifying normal and abnormal samples according to the principle of maximizing classification accuracy. At the classification and discrimination stage, the samples are divided into two classes according to their MDs and optimal threshold. Experimental results show that OMTS obtains 0.92, 0.95, 0.81, 0.88, and 0.74 in accuracy on the Forest Type Mapping UCI, Fetal Health Classification, Connectionist Bench, Wine Quality, and Oil datasets, respectively, and has better classification performance than other algorithms.
Similar content being viewed by others
References
Fernández A, del Río S, Chawla NV, Herrera F (2017) An insight into imbalanced Big Data classification: outcomes and challenges. Complex Intell Syst 3:105–120. https://doi.org/10.1007/s40747-017-0037-9
Li YX, Chai Y, Hu YQ, Yin HP (2019) Review of imbalanced data classification methods. Control Decis 34:673–688
Priya S, Uthra RA (2020) Comprehensive analysis for class imbalance data with concept drift using ensemble based classification. J Ambient Intell Humaniz Comput. https://doi.org/10.1007/s12652-020-01934-y
Thabtah F, Peebles D (2020) A new machine learning model based on induction of rules for autism detection. Health Informatics J 26:264–286. https://doi.org/10.1177/1460458218824711
Dhote S, Vichoray C, Pais R et al (2020) Hybrid geometric sampling and AdaBoost based deep learning approach for data imbalance in E-commerce. Electron Commer Res 20:259–274. https://doi.org/10.1007/s10660-019-09383-2
Hu Z, Chiong R, Pranata I et al (2018) Malicious web domain identification using online credibility and performance data by considering the class imbalance issue. Ind Manag Data Syst 119. https://doi.org/10.1108/IMDS-02-2018-0072
Wang Z, Peng C, Zhang N et al (2021) Fully convolutional siamese networks based change detection for optical aerial images with focal contrastive loss. Neurocomputing 457:55–167. https://doi.org/10.1016/j.neucom.2021.06.059https://doi.org/10.1093/mnras/staa642
Guo HX, Li YJ, Shang J et al (2017) Learning from class-imbalanced data: review of methods and applications. Expert Syst Appl 73:220–239. https://doi.org/10.1016/j.eswa.2016.12.03510.1016/j.eswa.2016.12.035
Zhang C, Bi J, Xu S et al (2019) Multi-Imbalance: An open-source software for multi-class imbalance learning. Knowledge-Based Syst 174:137–143. https://doi.org/10.1016/j.knosys.2019.03.001
Lin WC, Tsai CF, Hu YH, Jhang JS (2017) Clustering-based undersampling in class-imbalanced data. Inf Sci (Ny) 409–410:17–26. https://doi.org/10.1016/j.ins.2017.05.008
Vluymans S (2019) Learning from imbalanced data. Stud Comput Intell 807:81–110. https://doi.org/10.1007/978-3-030-04663-7_4
Xiang HX, Yang Y (2019) Summarization of imbalanced Data Mining Methods. Computer Engineering and Applications 55:1–6. https://doi.org/10.3778/j.issn.1002-8331.1810-0420
El-Banna M (2017) Modified Mahalanobis Taguchi System for Imbalance Data Classification Comput Intell Neurosci:2017. https://doi.org/10.1155/2017/5874896
Hsiao YH, Su CT, Fu PC (2020) Integrating MTS with bagging strategy for class imbalance problems. Int J Mach Learn Cybern 11:1217–1230. https://doi.org/10.1007/s13042-019-01033-1
Zhan J, Cheng LS, Peng ZM et al (2019) Control chart pattern recognition based on hybrid model and improved multi-class Mahalanobis system. China Mechanical Engineering 30:2716–2724. https://doi.org/10.3969/j.issn.1004-132X.2019.22.011
Hayashi T, Fujita H (2021) One-class ensemble classifier for data imbalance problems. Appl Intell. https://doi.org/10.1007/s10489-021-02671-1
Peng L, Zhang H, Chen Y, Yang B (2017) Imbalanced traffic identification using an imbalanced data gravitation-based classification model. Comput Commun 102:177–189. https://doi.org/10.1016/j.comcom.2016.05.010
Yap BW, Rani KA, Abd Rahman HA et al (2014) An application of oversampling, undersampling, bagging and boosting in handling imbalanced datasets. Lect Notes Electr Eng 285 LNEE:13–22. https://doi.org/10.1007/978-981-4585-18-7_2
Zhu T, Lin Y, Liu Y (2017) Synthetic minority oversampling technique for multiclass imbalance problems. Pattern Recogn 72:327–340. https://doi.org/10.1016/j.patcog.2017.07.024
Zhang X, Li R, Zhang B et al (2019) An instance-based learning recommendation algorithm of imbalance handling methods. Appl Math Comput 351:204–218. https://doi.org/10.1016/j.amc.2018.12.020
Riaz S, Arshad A, Jiao L (2018) Rough noise-filtered easy ensemble for software fault prediction. IEEE Access 6:46886–46899. https://doi.org/10.1109/ACCESS.2018.2865383
Bader-El-Den M, Teitei E, Perry T (2019) Biased random forest for dealing with the class imbalance problem. IEEE Trans Neural Networks Learn Syst 30:2163–2172. https://doi.org/10.1109/TNNLS.2018.2878400
Maurya CK, Toshniwal D (2018) Large-Scale Distributed Sparse Class-Imbalance Learning. Inf Sci (Ny) 456:1–12. https://doi.org/10.1016/j.ins.2018.05.004
Sun J, Lang J, Fujita H, Li H (2018) Imbalanced enterprise credit evaluation with DTE-SBD: Decision tree ensemble based on SMOTE and bagging with differentiated sampling rates. Inf Sci (Ny) 425:76–91. https://doi.org/10.1016/j.ins.2017.10.017
Wang B, Ding S, Liu X et al (2021) Predictive classification of ICU readmission using weight decay random forest. Futur Gener Comput Syst 124:351–360. https://doi.org/10.1016/j.future.2021.06.011
Diez-Olivan A, Ortego P, Del Ser J et al (2021) Adaptive dendritic cell-deep learning approach for industrial prognosis under changing conditions. IEEE Trans Ind Informatics 17:7760–7770. https://doi.org/10.1109/TII.2021.3058350
Taguchi G, Chowdhury S, Wu Y (2001) The Mahalanobis-Taguchi System. McGraw-Hill Professional
Rizal M, Ghani JA, Nuawi MZ, Haron CHC (2017) Cutting tool wear classification and detection using multi-sensor signals and Mahalanobis-Taguchi System. Wear 376–377:1759–1765
Peng ZM, Cheng LS, Zhan J et al (2020) Complex system health evaluation based on improved Mahalanobis system. System Engineering and Electronic Technolog 42:960–968. https://doi.org/10.3969/j.issn.1001-506X.2020.04.30
Sakeran H, Osman NAA, Majid MSA (2019) Gait classification using Mahalanobis-Taguchi system for health monitoring systems following anterior cruciate ligament reconstruction. Appl Sci 9. https://doi.org/10.3390/app9163306
Wang H, Huo N, Li J et al (2018) A Road Quality Detection Method Based on the Mahalanobis-Taguchi System. IEEE Access 6:29078–29087. https://doi.org/10.1109/ACCESS.2018.2839765
Yu S, Huang TZ (2017) Exponential weighted entropy and exponential weighted mutual information. Neurocomputing 249:86–94. https://doi.org/10.1016/j.neucom.2017.03.075
Nor NA, Ibrahim Z, Mubin M et al (2018) Improving particle swarm optimization via adaptive switching asynchronous – synchronous update. Appl Soft Comput J 72:298–311. https://doi.org/10.1016/j.asoc.2018.07.047
Han Z, Li Y, Liang J (2018) Numerical Improvement for the Mechanical Performance of Bikes Based on an Intelligent PSO-ABC Algorithm and WSN Technology. IEEE Access 6:32890–32898. https://doi.org/10.1109/ACCESS.2018.2845366
Qian X, Jia S, Huang K et al (2020) Optimal design of Kaibel dividing wall columns based on improved particle swarm optimization methods. J Clean Prod 273:123041. https://doi.org/10.1016/j.jclepro.2020.123041
Mason K, Duggan J, Howley E (2017) Multi-objective dynamic economic emission dispatch using particle swarm optimisation variants. Neurocomputing 270:188–197. https://doi.org/10.1016/j.neucom.2017.03.086
Usman M, Pang W, Coghill GM (2020) Inferring structure and parameters of dynamic system models simultaneously using swarm intelligence approaches. Memetic Comput 12:267–282. https://doi.org/10.1007/s12293-020-00306-5
Djenouri Y, Comuzzi M (2017) Combining Apriori heuristic and bio-inspired algorithms for solving the frequent itemsets mining problem. Inf Sci (Ny) 420:1–15. https://doi.org/10.1016/j.ins.2017.08.043
Yahya AA, Osman A, El-Bashir MS (2017) Rocchio algorithm-based particle initialization mechanism for effective PSO classification of high dimensional data. Swarm Evol Comput 34:18–32. https://doi.org/10.1016/j.swevo.2016.11.005
Marq J (2000) A program for automated analysis of Cardiotocograms. J Motern Fetal Med 9:311–318. https://doi.org/10.3109/14767050009053454
Sejnowski TJ (1988) Analysis of Hidden Units in a Layered Network. Technology 1:75–89
Connectionist Bench (Sonar, Mines vs. Rocks) Data Set [Online]. Available: https://www.kaggle.com/mattcarter865/mines-vs-rocks
Lemaître G, Nogueira F, Aridas CK (2017) Imbalanced-learn: A python toolbox to tackle the curse of imbalanced datasets in machine learning. J Mach Learn Res 18:1–5
Alberdi A, Weakley A, Schmitter-Edgecombe M et al (2018) Smart Home-Based Prediction of Multidomain Symptoms Related to Alzheimer’s Disease. IEEE J Biomed Heal Informatics 22:1720–1731. https://doi.org/10.1109/JBHI.2018.2798062
Qin H, Zhou H, Cao J (2020) Imbalanced learning algorithm based intelligent abnormal electricity consumption detection. Neurocomputing 402:112–123. https://doi.org/10.1016/j.neucom.2020.03.085
Huang X, Zhang CZ, Yuan J (2020) Predicting Extreme Financial Risks on Imbalanced Dataset: A Combined Kernel FCM and Kernel SMOTE Based SVM Classifier. Comput Econ 56:187–216. https://doi.org/10.1007/s10614-020-09975-3
Cheng L, Yaghoubi V, Van Paepegem W, Kersemans M (2021) Mahalanobis classification system (MCS) integrated with binary particle swarm optimization for robust quality classification of complex metallic turbine blades. Mech Syst Signal Process 146:107060. https://doi.org/10.1016/j.ymssp.2020.107060
Acknowledgments
This work was supported by the National Natural Science Foundation of China (Grant No. 18BJY033); and China Jiliang University Student Research Key Funding Project (Grant No. 2020X24060).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Mao, T., Zhou, L., Zhang, Y. et al. Classification algorithm for class imbalanced data based on optimized Mahalanobis-Taguchi system. Appl Intell 52, 10674–10691 (2022). https://doi.org/10.1007/s10489-021-02929-8
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-021-02929-8