Skip to main content
Log in

Classification algorithm for class imbalanced data based on optimized Mahalanobis-Taguchi system

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Imbalanced data classification is a challenge in data mining and machine learning. To improve the classification performance for imbalanced data, this paper proposes an imbalanced data classification algorithm based on the optimized Mahalanobis-Taguchi system (OMTS). At the feature selection stage, important feature variables are determined by four principles, namely maximizing mutual information between features and classes, minimizing mutual information between features, maximizing the initial classification accuracy, and selecting features that produce not only the local maximum or minimum of the difference between the mean Mahalanobis distances (MDs) of normal and abnormal samples but also the largest number of features. At the threshold determination stage, using the selected features, particle swarm optimization is used to determine the optimal threshold for classifying normal and abnormal samples according to the principle of maximizing classification accuracy. At the classification and discrimination stage, the samples are divided into two classes according to their MDs and optimal threshold. Experimental results show that OMTS obtains 0.92, 0.95, 0.81, 0.88, and 0.74 in accuracy on the Forest Type Mapping UCI, Fetal Health Classification, Connectionist Bench, Wine Quality, and Oil datasets, respectively, and has better classification performance than other algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  1. Fernández A, del Río S, Chawla NV, Herrera F (2017) An insight into imbalanced Big Data classification: outcomes and challenges. Complex Intell Syst 3:105–120. https://doi.org/10.1007/s40747-017-0037-9

    Article  Google Scholar 

  2. Li YX, Chai Y, Hu YQ, Yin HP (2019) Review of imbalanced data classification methods. Control Decis 34:673–688

    MATH  Google Scholar 

  3. Priya S, Uthra RA (2020) Comprehensive analysis for class imbalance data with concept drift using ensemble based classification. J Ambient Intell Humaniz Comput. https://doi.org/10.1007/s12652-020-01934-y

  4. Thabtah F, Peebles D (2020) A new machine learning model based on induction of rules for autism detection. Health Informatics J 26:264–286. https://doi.org/10.1177/1460458218824711

    Article  Google Scholar 

  5. Dhote S, Vichoray C, Pais R et al (2020) Hybrid geometric sampling and AdaBoost based deep learning approach for data imbalance in E-commerce. Electron Commer Res 20:259–274. https://doi.org/10.1007/s10660-019-09383-2

    Article  Google Scholar 

  6. Hu Z, Chiong R, Pranata I et al (2018) Malicious web domain identification using online credibility and performance data by considering the class imbalance issue. Ind Manag Data Syst 119. https://doi.org/10.1108/IMDS-02-2018-0072

  7. Wang Z, Peng C, Zhang N et al (2021) Fully convolutional siamese networks based change detection for optical aerial images with focal contrastive loss. Neurocomputing 457:55–167. https://doi.org/10.1016/j.neucom.2021.06.059https://doi.org/10.1093/mnras/staa642

    Article  Google Scholar 

  8. Guo HX, Li YJ, Shang J et al (2017) Learning from class-imbalanced data: review of methods and applications. Expert Syst Appl 73:220–239. https://doi.org/10.1016/j.eswa.2016.12.03510.1016/j.eswa.2016.12.035

    Article  Google Scholar 

  9. Zhang C, Bi J, Xu S et al (2019) Multi-Imbalance: An open-source software for multi-class imbalance learning. Knowledge-Based Syst 174:137–143. https://doi.org/10.1016/j.knosys.2019.03.001

    Article  Google Scholar 

  10. Lin WC, Tsai CF, Hu YH, Jhang JS (2017) Clustering-based undersampling in class-imbalanced data. Inf Sci (Ny) 409–410:17–26. https://doi.org/10.1016/j.ins.2017.05.008

    Article  Google Scholar 

  11. Vluymans S (2019) Learning from imbalanced data. Stud Comput Intell 807:81–110. https://doi.org/10.1007/978-3-030-04663-7_4

    Article  Google Scholar 

  12. Xiang HX, Yang Y (2019) Summarization of imbalanced Data Mining Methods. Computer Engineering and Applications 55:1–6. https://doi.org/10.3778/j.issn.1002-8331.1810-0420

    Article  Google Scholar 

  13. El-Banna M (2017) Modified Mahalanobis Taguchi System for Imbalance Data Classification Comput Intell Neurosci:2017. https://doi.org/10.1155/2017/5874896

  14. Hsiao YH, Su CT, Fu PC (2020) Integrating MTS with bagging strategy for class imbalance problems. Int J Mach Learn Cybern 11:1217–1230. https://doi.org/10.1007/s13042-019-01033-1

    Article  Google Scholar 

  15. Zhan J, Cheng LS, Peng ZM et al (2019) Control chart pattern recognition based on hybrid model and improved multi-class Mahalanobis system. China Mechanical Engineering 30:2716–2724. https://doi.org/10.3969/j.issn.1004-132X.2019.22.011

    Article  Google Scholar 

  16. Hayashi T, Fujita H (2021) One-class ensemble classifier for data imbalance problems. Appl Intell. https://doi.org/10.1007/s10489-021-02671-1

  17. Peng L, Zhang H, Chen Y, Yang B (2017) Imbalanced traffic identification using an imbalanced data gravitation-based classification model. Comput Commun 102:177–189. https://doi.org/10.1016/j.comcom.2016.05.010

    Article  Google Scholar 

  18. Yap BW, Rani KA, Abd Rahman HA et al (2014) An application of oversampling, undersampling, bagging and boosting in handling imbalanced datasets. Lect Notes Electr Eng 285 LNEE:13–22. https://doi.org/10.1007/978-981-4585-18-7_2

    Article  Google Scholar 

  19. Zhu T, Lin Y, Liu Y (2017) Synthetic minority oversampling technique for multiclass imbalance problems. Pattern Recogn 72:327–340. https://doi.org/10.1016/j.patcog.2017.07.024

    Article  Google Scholar 

  20. Zhang X, Li R, Zhang B et al (2019) An instance-based learning recommendation algorithm of imbalance handling methods. Appl Math Comput 351:204–218. https://doi.org/10.1016/j.amc.2018.12.020

    Article  MathSciNet  MATH  Google Scholar 

  21. Riaz S, Arshad A, Jiao L (2018) Rough noise-filtered easy ensemble for software fault prediction. IEEE Access 6:46886–46899. https://doi.org/10.1109/ACCESS.2018.2865383

    Article  Google Scholar 

  22. Bader-El-Den M, Teitei E, Perry T (2019) Biased random forest for dealing with the class imbalance problem. IEEE Trans Neural Networks Learn Syst 30:2163–2172. https://doi.org/10.1109/TNNLS.2018.2878400

    Article  Google Scholar 

  23. Maurya CK, Toshniwal D (2018) Large-Scale Distributed Sparse Class-Imbalance Learning. Inf Sci (Ny) 456:1–12. https://doi.org/10.1016/j.ins.2018.05.004

    Article  MathSciNet  MATH  Google Scholar 

  24. Sun J, Lang J, Fujita H, Li H (2018) Imbalanced enterprise credit evaluation with DTE-SBD: Decision tree ensemble based on SMOTE and bagging with differentiated sampling rates. Inf Sci (Ny) 425:76–91. https://doi.org/10.1016/j.ins.2017.10.017

    Article  MathSciNet  Google Scholar 

  25. Wang B, Ding S, Liu X et al (2021) Predictive classification of ICU readmission using weight decay random forest. Futur Gener Comput Syst 124:351–360. https://doi.org/10.1016/j.future.2021.06.011

    Article  Google Scholar 

  26. Diez-Olivan A, Ortego P, Del Ser J et al (2021) Adaptive dendritic cell-deep learning approach for industrial prognosis under changing conditions. IEEE Trans Ind Informatics 17:7760–7770. https://doi.org/10.1109/TII.2021.3058350

    Article  Google Scholar 

  27. Taguchi G, Chowdhury S, Wu Y (2001) The Mahalanobis-Taguchi System. McGraw-Hill Professional

  28. Rizal M, Ghani JA, Nuawi MZ, Haron CHC (2017) Cutting tool wear classification and detection using multi-sensor signals and Mahalanobis-Taguchi System. Wear 376–377:1759–1765

    Article  Google Scholar 

  29. Peng ZM, Cheng LS, Zhan J et al (2020) Complex system health evaluation based on improved Mahalanobis system. System Engineering and Electronic Technolog 42:960–968. https://doi.org/10.3969/j.issn.1001-506X.2020.04.30

    Article  Google Scholar 

  30. Sakeran H, Osman NAA, Majid MSA (2019) Gait classification using Mahalanobis-Taguchi system for health monitoring systems following anterior cruciate ligament reconstruction. Appl Sci 9. https://doi.org/10.3390/app9163306

  31. Wang H, Huo N, Li J et al (2018) A Road Quality Detection Method Based on the Mahalanobis-Taguchi System. IEEE Access 6:29078–29087. https://doi.org/10.1109/ACCESS.2018.2839765

    Article  Google Scholar 

  32. Yu S, Huang TZ (2017) Exponential weighted entropy and exponential weighted mutual information. Neurocomputing 249:86–94. https://doi.org/10.1016/j.neucom.2017.03.075

    Article  Google Scholar 

  33. Nor NA, Ibrahim Z, Mubin M et al (2018) Improving particle swarm optimization via adaptive switching asynchronous – synchronous update. Appl Soft Comput J 72:298–311. https://doi.org/10.1016/j.asoc.2018.07.047

    Article  Google Scholar 

  34. Han Z, Li Y, Liang J (2018) Numerical Improvement for the Mechanical Performance of Bikes Based on an Intelligent PSO-ABC Algorithm and WSN Technology. IEEE Access 6:32890–32898. https://doi.org/10.1109/ACCESS.2018.2845366

    Article  Google Scholar 

  35. Qian X, Jia S, Huang K et al (2020) Optimal design of Kaibel dividing wall columns based on improved particle swarm optimization methods. J Clean Prod 273:123041. https://doi.org/10.1016/j.jclepro.2020.123041

    Article  Google Scholar 

  36. Mason K, Duggan J, Howley E (2017) Multi-objective dynamic economic emission dispatch using particle swarm optimisation variants. Neurocomputing 270:188–197. https://doi.org/10.1016/j.neucom.2017.03.086

    Article  Google Scholar 

  37. Usman M, Pang W, Coghill GM (2020) Inferring structure and parameters of dynamic system models simultaneously using swarm intelligence approaches. Memetic Comput 12:267–282. https://doi.org/10.1007/s12293-020-00306-5

    Article  Google Scholar 

  38. Djenouri Y, Comuzzi M (2017) Combining Apriori heuristic and bio-inspired algorithms for solving the frequent itemsets mining problem. Inf Sci (Ny) 420:1–15. https://doi.org/10.1016/j.ins.2017.08.043

    Article  Google Scholar 

  39. Yahya AA, Osman A, El-Bashir MS (2017) Rocchio algorithm-based particle initialization mechanism for effective PSO classification of high dimensional data. Swarm Evol Comput 34:18–32. https://doi.org/10.1016/j.swevo.2016.11.005

    Article  Google Scholar 

  40. Marq J (2000) A program for automated analysis of Cardiotocograms. J Motern Fetal Med 9:311–318. https://doi.org/10.3109/14767050009053454

  41. Sejnowski TJ (1988) Analysis of Hidden Units in a Layered Network. Technology 1:75–89

    Google Scholar 

  42. Connectionist Bench (Sonar, Mines vs. Rocks) Data Set [Online]. Available: https://www.kaggle.com/mattcarter865/mines-vs-rocks

  43. Lemaître G, Nogueira F, Aridas CK (2017) Imbalanced-learn: A python toolbox to tackle the curse of imbalanced datasets in machine learning. J Mach Learn Res 18:1–5

    Google Scholar 

  44. Alberdi A, Weakley A, Schmitter-Edgecombe M et al (2018) Smart Home-Based Prediction of Multidomain Symptoms Related to Alzheimer’s Disease. IEEE J Biomed Heal Informatics 22:1720–1731. https://doi.org/10.1109/JBHI.2018.2798062

    Article  Google Scholar 

  45. Qin H, Zhou H, Cao J (2020) Imbalanced learning algorithm based intelligent abnormal electricity consumption detection. Neurocomputing 402:112–123. https://doi.org/10.1016/j.neucom.2020.03.085

    Article  Google Scholar 

  46. Huang X, Zhang CZ, Yuan J (2020) Predicting Extreme Financial Risks on Imbalanced Dataset: A Combined Kernel FCM and Kernel SMOTE Based SVM Classifier. Comput Econ 56:187–216. https://doi.org/10.1007/s10614-020-09975-3

    Article  Google Scholar 

  47. Cheng L, Yaghoubi V, Van Paepegem W, Kersemans M (2021) Mahalanobis classification system (MCS) integrated with binary particle swarm optimization for robust quality classification of complex metallic turbine blades. Mech Syst Signal Process 146:107060. https://doi.org/10.1016/j.ymssp.2020.107060

    Article  Google Scholar 

Download references

Acknowledgments

This work was supported by the National Natural Science Foundation of China (Grant No. 18BJY033); and China Jiliang University Student Research Key Funding Project (Grant No. 2020X24060).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yueyi Zhang.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Mao, T., Zhou, L., Zhang, Y. et al. Classification algorithm for class imbalanced data based on optimized Mahalanobis-Taguchi system. Appl Intell 52, 10674–10691 (2022). https://doi.org/10.1007/s10489-021-02929-8

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-021-02929-8

Keywords

Navigation