Abstract
In domain of data mining, learning from imbalanced class distribution datasets is a challenging problem for conventional classifiers. The class imbalance exists when the number of samples of one class is much lesser than the ones of the other classes. In real-world classification problems, data samples often have unequal class distribution. This problem is represented as a class imbalance problem. However, many solutions have been proposed in the literature to improve classifier performance. But recent works entitlement that imbalanced dataset is not a problem in itself. The degradation of classifier performance is also linked with many factors like small sample size, sample overlapping, class disjunct and many more. In this work, we proposed cluster-based under-sampling based on farthest neighbors. The majority class samples are selected based on the average distance to all minority class samples in the cluster are farthest. The experimental results show that our cluster-based under-sampling approach outperform with existing techniques in the previous studies.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Lin, W.-Y., Hu, Y.-H., Tsai, C.-F.: Machine learning in financial crisis prediction: a survey. IEEE Trans. Syst. Man Cybern. Part C Appl. Rev. 42(4), 421–436 (2012)
Kourou, K., Exarchos, T.P., Exarchos, K.P., Karamouzis, M.V., Fotiadis, D.I.: Machine learning applications in cancer prognosis and prediction. Comput. Struct. Biotech. J. 13, 8–17 (2015)
Mahajan, V., Misra, R., Mahajan, R.: Review of data mining techniques for churn prediction in telecom. J. Inf. Organ. Sci. 39(2), 183–197 (2015)
Zafeiriou, S., Zhang, C., Zhang, Z.: A survey on face detection in the wild: past, present and future. Comput. Vis. Image Underst. 138, 1–24 (2015)
West, J., Bhattacharya, M.: Intelligent financial fraud detection: a comprehensive review. Comput. Secur. 57, 47–66 (2016)
Malhotra, R.: A systematic review of machine learning techniques for software fault prediction. Appl. Soft Comput. 27, 504–518 (2015)
Kubat, M., Matwin, S.: Addressing the curse of imbalanced training sets: one-sided selection. In: ICML, pp. 179–186 (1997)
Estabrooks, A., Japkowicz, N.: A mixture-of-experts framework for learning from imbalanced data sets. In: Hoffmann, F., Hand, D.J., Adams, N., Fisher, D., Guimaraes, G. (eds.) IDA 2001. LNCS, vol. 2189, pp. 34–43. Springer, Heidelberg (2001). https://doi.org/10.1007/3-540-44816-0_4
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)
Han, H., Wang, W.Y., Mao, B.H.: Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning. In: International Conference on Intelligent Computing, pp. 878–887. Springer, Heidelberg, August 2005
Bunkhumpornpat, C., Sinapiromsaran, K., Lursinsap, C.: Safe-level-smote: safe-level-synthetic minority over-sampling technique for handling the class imbalanced problem. In: Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp. 475–482. Springer, Heidelberg, April 2009
Kubat, M., Matwin, S.: Addressing the curse of imbalanced training sets: one-sided selection. In: ICML, vol. 97, pp. 179–186, July 1997
Estabrooks, A., Jo, T., Japkowicz, N.: A multiple resampling method for learning from imbalanced data sets. Comput. Intell. 20(1), 18–36 (2004)
Wu, G., Chang, E.Y.: KBA: kernel boundary alignment considering imbalanced data distribution. IEEE Trans. Knowl. Data Eng. 17(6), 786–795 (2005)
Ohsaki, M., Wang, P., Matsuda, K., Katagiri, S., Watanabe, H., Ralescu, A.: Confusion-matrix-based kernel logistic regression for imbalanced data classification. IEEE Trans. Knowl. Data Eng. 29(9), 1806–1819 (2017)
Khan, S.H., Hayat, M., Bennamoun, M., Sohel, F.A., Togneri, R.: Cost-sensitive learning of deep feature representations from imbalanced data. IEEE Trans. Neural Networks Learn. Syst. 29(8), 3573–3587 (2017)
Dong, Q., Gong, S., Zhu, X.: Imbalanced deep learning by minority class incremental rectification. IEEE Trans. Pattern Anal. Mach. Intell. 41(6), 1367–1381 (2018)
Breiman, L.: Bagging predictors. Mach. Learn. 24(2), 123–140 (1996)
Ren, Y., Zhang, L., Suganthan, P.N.: Ensemble classification and regression-recent developments, applications and future directions. IEEE Comput. Intell. Mag. 11(1), 41–53 (2016)
MartÃnez-Muñoz, G., Suárez, A.: Using boosting to prune bagging ensembles. Pattern Recogn. Lett. 28(1), 156–165 (2007)
Li, Z.X., Zhao, L.D.: A SVM classifier for imbalanced datasets based on SMOTEBoost. Syst. Eng. 26(5), 116–119 (2008)
Seiffert, C., Khoshgoftaar, T.M., Van Hulse, J., Napolitano, A.: RUSBoost: a hybrid approach to alleviating class imbalance. IEEE Trans. Syst. Man Cybern. Part A Syst. Hum. 40(1), 185–197 (2009)
Guo, H., Viktor, H.L.: Learning from imbalanced data sets with boosting and data generation: the DataBoost-IM approach. ACM SIGKDD Explor. Newsl. 6(1), 30–39 (2004)
Hakim, L., Sartono, B., Saefuddin, A.: Bagging based ensemble classification method on imbalance datasets. Int. J. Comput. Sci. Netw. 6, 7 (2017)
Yongqing, Z., Min, Z., Danling, Z., Gang, M., Daichuan, M.: Improved SMOTEBagging and its application in imbalanced data classification. In: IEEE Conference Anthology, pp. 1–5. IEEE, January 2013
Galar, M., Fernandez, A., Barrenechea, E., Bustince, H., Herrera, F.: A review on ensembles for the class imbalance problem: bagging-, boosting-, and hybrid-based approaches. IEEE Trans. Syst. Man Cybern. Part C Appl. Rev. 42(4), 463–484 (2011)
Rekha, G., Tyagi, A.K., Krishna Reddy, V.: A wide scale classification of class imbalance problem and its solutions: a systematic literature review. J. Comput. Sci. 15, 886–929 (2019)
Rekha, G., Tyagi, A.K., Krishna Reddy, V.: Solving class imbalance problem using bagging, boosting techniques, with and without using noise filtering method. Int. J. Hybrid Intell. Syst. 15, 67–76 (2019)
Acknowledgment
This Research is funded by Anumit Academy’s Research and Innovation Network (AARIN), India. The Author Would Like to Thank AARIN, India, a Research Network for Supporting The Project Through its Financial Assistance.
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Rekha, G., Tyagi, A.K. (2021). Cluster-Based Under-Sampling Using Farthest Neighbour Technique for Imbalanced Datasets. In: Abraham, A., Panda, M., Pradhan, S., Garcia-Hernandez, L., Ma, K. (eds) Innovations in Bio-Inspired Computing and Applications. IBICA 2019. Advances in Intelligent Systems and Computing, vol 1180. Springer, Cham. https://doi.org/10.1007/978-3-030-49339-4_5
Download citation
DOI: https://doi.org/10.1007/978-3-030-49339-4_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-49338-7
Online ISBN: 978-3-030-49339-4
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)