Cluster-Based Under-Sampling Using Farthest Neighbour Technique for Imbalanced Datasets

Rekha, G.; Tyagi, Amit Kumar

doi:10.1007/978-3-030-49339-4_5

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 1180))

Included in the following conference series:

International Conference on Innovations in Bio-Inspired Computing and Applications

403 Accesses
3 Citations

Abstract

In domain of data mining, learning from imbalanced class distribution datasets is a challenging problem for conventional classifiers. The class imbalance exists when the number of samples of one class is much lesser than the ones of the other classes. In real-world classification problems, data samples often have unequal class distribution. This problem is represented as a class imbalance problem. However, many solutions have been proposed in the literature to improve classifier performance. But recent works entitlement that imbalanced dataset is not a problem in itself. The degradation of classifier performance is also linked with many factors like small sample size, sample overlapping, class disjunct and many more. In this work, we proposed cluster-based under-sampling based on farthest neighbors. The majority class samples are selected based on the average distance to all minority class samples in the cluster are farthest. The experimental results show that our cluster-based under-sampling approach outperform with existing techniques in the previous studies.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://sci2s.ugr.es/keel/imbalanced.php.

References

Lin, W.-Y., Hu, Y.-H., Tsai, C.-F.: Machine learning in financial crisis prediction: a survey. IEEE Trans. Syst. Man Cybern. Part C Appl. Rev. 42(4), 421–436 (2012)
Google Scholar
Kourou, K., Exarchos, T.P., Exarchos, K.P., Karamouzis, M.V., Fotiadis, D.I.: Machine learning applications in cancer prognosis and prediction. Comput. Struct. Biotech. J. 13, 8–17 (2015)
Google Scholar
Mahajan, V., Misra, R., Mahajan, R.: Review of data mining techniques for churn prediction in telecom. J. Inf. Organ. Sci. 39(2), 183–197 (2015)
Google Scholar
Zafeiriou, S., Zhang, C., Zhang, Z.: A survey on face detection in the wild: past, present and future. Comput. Vis. Image Underst. 138, 1–24 (2015)
Google Scholar
West, J., Bhattacharya, M.: Intelligent financial fraud detection: a comprehensive review. Comput. Secur. 57, 47–66 (2016)
Google Scholar
Malhotra, R.: A systematic review of machine learning techniques for software fault prediction. Appl. Soft Comput. 27, 504–518 (2015)
Google Scholar
Kubat, M., Matwin, S.: Addressing the curse of imbalanced training sets: one-sided selection. In: ICML, pp. 179–186 (1997)
Google Scholar
Estabrooks, A., Japkowicz, N.: A mixture-of-experts framework for learning from imbalanced data sets. In: Hoffmann, F., Hand, D.J., Adams, N., Fisher, D., Guimaraes, G. (eds.) IDA 2001. LNCS, vol. 2189, pp. 34–43. Springer, Heidelberg (2001). https://doi.org/10.1007/3-540-44816-0_4
Chapter Google Scholar
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)
MATH Google Scholar
Han, H., Wang, W.Y., Mao, B.H.: Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning. In: International Conference on Intelligent Computing, pp. 878–887. Springer, Heidelberg, August 2005
Google Scholar
Bunkhumpornpat, C., Sinapiromsaran, K., Lursinsap, C.: Safe-level-smote: safe-level-synthetic minority over-sampling technique for handling the class imbalanced problem. In: Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp. 475–482. Springer, Heidelberg, April 2009
Google Scholar
Kubat, M., Matwin, S.: Addressing the curse of imbalanced training sets: one-sided selection. In: ICML, vol. 97, pp. 179–186, July 1997
Google Scholar
Estabrooks, A., Jo, T., Japkowicz, N.: A multiple resampling method for learning from imbalanced data sets. Comput. Intell. 20(1), 18–36 (2004)
MathSciNet Google Scholar
Wu, G., Chang, E.Y.: KBA: kernel boundary alignment considering imbalanced data distribution. IEEE Trans. Knowl. Data Eng. 17(6), 786–795 (2005)
Google Scholar
Ohsaki, M., Wang, P., Matsuda, K., Katagiri, S., Watanabe, H., Ralescu, A.: Confusion-matrix-based kernel logistic regression for imbalanced data classification. IEEE Trans. Knowl. Data Eng. 29(9), 1806–1819 (2017)
Google Scholar
Khan, S.H., Hayat, M., Bennamoun, M., Sohel, F.A., Togneri, R.: Cost-sensitive learning of deep feature representations from imbalanced data. IEEE Trans. Neural Networks Learn. Syst. 29(8), 3573–3587 (2017)
Google Scholar
Dong, Q., Gong, S., Zhu, X.: Imbalanced deep learning by minority class incremental rectification. IEEE Trans. Pattern Anal. Mach. Intell. 41(6), 1367–1381 (2018)
Google Scholar
Breiman, L.: Bagging predictors. Mach. Learn. 24(2), 123–140 (1996)
MATH Google Scholar
Ren, Y., Zhang, L., Suganthan, P.N.: Ensemble classification and regression-recent developments, applications and future directions. IEEE Comput. Intell. Mag. 11(1), 41–53 (2016)
Google Scholar
Martínez-Muñoz, G., Suárez, A.: Using boosting to prune bagging ensembles. Pattern Recogn. Lett. 28(1), 156–165 (2007)
Google Scholar
Li, Z.X., Zhao, L.D.: A SVM classifier for imbalanced datasets based on SMOTEBoost. Syst. Eng. 26(5), 116–119 (2008)
Google Scholar
Seiffert, C., Khoshgoftaar, T.M., Van Hulse, J., Napolitano, A.: RUSBoost: a hybrid approach to alleviating class imbalance. IEEE Trans. Syst. Man Cybern. Part A Syst. Hum. 40(1), 185–197 (2009)
Google Scholar
Guo, H., Viktor, H.L.: Learning from imbalanced data sets with boosting and data generation: the DataBoost-IM approach. ACM SIGKDD Explor. Newsl. 6(1), 30–39 (2004)
Google Scholar
Hakim, L., Sartono, B., Saefuddin, A.: Bagging based ensemble classification method on imbalance datasets. Int. J. Comput. Sci. Netw. 6, 7 (2017)
Google Scholar
Yongqing, Z., Min, Z., Danling, Z., Gang, M., Daichuan, M.: Improved SMOTEBagging and its application in imbalanced data classification. In: IEEE Conference Anthology, pp. 1–5. IEEE, January 2013
Google Scholar
Galar, M., Fernandez, A., Barrenechea, E., Bustince, H., Herrera, F.: A review on ensembles for the class imbalance problem: bagging-, boosting-, and hybrid-based approaches. IEEE Trans. Syst. Man Cybern. Part C Appl. Rev. 42(4), 463–484 (2011)
Google Scholar
Rekha, G., Tyagi, A.K., Krishna Reddy, V.: A wide scale classification of class imbalance problem and its solutions: a systematic literature review. J. Comput. Sci. 15, 886–929 (2019)
Google Scholar
Rekha, G., Tyagi, A.K., Krishna Reddy, V.: Solving class imbalance problem using bagging, boosting techniques, with and without using noise filtering method. Int. J. Hybrid Intell. Syst. 15, 67–76 (2019)
Google Scholar

Download references

Acknowledgment

This Research is funded by Anumit Academy’s Research and Innovation Network (AARIN), India. The Author Would Like to Thank AARIN, India, a Research Network for Supporting The Project Through its Financial Assistance.

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Koneru Lakshmaiah Education Foundation, Hyderabad, India
G. Rekha
School of Computing Science and Engineering, Vellore Institute of Technology, Chennai Campus, Chennai, 600127, Tamilnadu, India
Amit Kumar Tyagi

Authors

G. Rekha
View author publications
You can also search for this author in PubMed Google Scholar
Amit Kumar Tyagi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to G. Rekha or Amit Kumar Tyagi .

Editor information

Editors and Affiliations

Scientific Network for Innovation and Research Excellence, Machine Intelligence Research Labs (MIR), Auburn, WA, USA
Ajith Abraham
Utkal University, Bhubaneswar, Odisha, India
Mrutyunjaya Panda
Department of Electronics Engineering, GIET University, Gunupur, Odisha, India
Subhrajit Pradhan
Area of Project Engineering, University of Cordoba, Córdoba, Córdoba, Spain
Laura Garcia-Hernandez
School of Information Science and Engineering, University of Jinan, Jinan, Shandong, China
Kun Ma

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Rekha, G., Tyagi, A.K. (2021). Cluster-Based Under-Sampling Using Farthest Neighbour Technique for Imbalanced Datasets. In: Abraham, A., Panda, M., Pradhan, S., Garcia-Hernandez, L., Ma, K. (eds) Innovations in Bio-Inspired Computing and Applications. IBICA 2019. Advances in Intelligent Systems and Computing, vol 1180. Springer, Cham. https://doi.org/10.1007/978-3-030-49339-4_5

Download citation

DOI: https://doi.org/10.1007/978-3-030-49339-4_5
Published: 06 August 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-49338-7
Online ISBN: 978-3-030-49339-4
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics