Imbalanced data classification based on scaling kernel-based support vector machine

Zhang, Yong; Fu, Panpan; Liu, Wenzhe; Chen, Guolong

doi:10.1007/s00521-014-1584-2

Imbalanced data classification based on scaling kernel-based support vector machine

Original Article
Published: 05 April 2014

Volume 25, pages 927–935, (2014)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

Yong Zhang¹,
Panpan Fu^1,2,
Wenzhe Liu¹ &
…
Guolong Chen²

1131 Accesses
54 Citations
Explore all metrics

Abstract

In many classification problems, the class distribution is imbalanced. Learning from the imbalance data is a remarkable challenge in the knowledge discovery and data mining field. In this paper, we propose a scaling kernel-based support vector machine (SVM) approach to deal with the multi-class imbalanced data classification problem. We first use standard SVM algorithm to gain an approximate hyperplane. Then, we present a scaling kernel function and calculate its parameters using the chi-square test and weighting factors. Experimental results on KEEL data sets show the proposed algorithm can resolve the classifier performance degradation problem due to data skewed distribution and has a good generalization.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Feature dimensionality reduction: a review

Article Open access 21 January 2022

Weikuan Jia, Meili Sun, … Sujuan Hou

Learning from imbalanced data: open challenges and future directions

Article Open access 22 April 2016

Bartosz Krawczyk

Supervised Classification Algorithms in Machine Learning: A Survey and Review

References

Menardi G, Torelli N (2014) Training and assessing classification rules with imbalanced data. Data Min Knowl Discov 28(1):92–122
Article MATH MathSciNet Google Scholar
Chawla NV, Japkowicz N, Kolcz A (2004) Editorial: special issue on learning from imbalanced data sets. SIGKDD Explor 6(1):1–6
Article Google Scholar
Mazurowski MA, Habas PA, Zurada JM, Lo JY, Baker JA, Tourassi GD (2008) Training neural network classifiers for medical decision making: the effects of imbalanced data sets on classification performance. Neural Netw 21(2–3):427–436
Article Google Scholar
Kubat M, Holte RC, Matwin S (1998) Machine learning for the detection of oil spills in satellite radar images. Mach Learn 6:195–215
Article Google Scholar
Daskalaki S, Kopanas I, Avouris N (2006) Evaluation of classifiers for an uneven class distribution problem. Appl Artif Intell 20(5):381–417
Article Google Scholar
He H, Garcia EA (2009) Learning from imbalanced data. IEEE Trans Knowl Data Eng 21(9):1263–1284
Article Google Scholar
Gao M, Hong X, Chen S, Harris CJ (2011) A combined SMOTE and PSO based RBF classifier for two-class imbalanced problems. Neurocomputing 74(17):3456–3466
Article Google Scholar
Kubat M, Matwin S (1997) Addressing the curse of imbalanced training sets: one-sided selection. In: Proceedings of the international conference on machine learning, 1997, pp. 179–186
Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) SMOTE: synthetic minority over-sampling technique. J Artif Intell Res 16(1):231–357
Google Scholar
Liu XY, Wu JX, Zhou ZH (2009) Exploratory undersampling for class-imbalance learning. IEEE Trans Syst Man Cybern Part B Cybern 39(2):539–550
Article Google Scholar
Wang S, Li Z, Chao W, Cao Q (2012) Applying adaptive over-sampling technique based on data density and cost-sensitive SVM to imbalanced learning. The 2012 international joint conference on neural networks (IJCNN 2012). doi:10.1109/IJCNN.2012.6252696
Gao M, Hong X, Chen S, Harris CJ (2012) Probability density function estimation based over-sampling for imbalanced two-class problems. The 2012 international joint conference on neural networks (IJCNN 2012). doi:10.1109/IJCNN.2012.6252384
Gu Q, Cai Z, Zhu L, Huang B (2008) Data mining on imbalanced data sets. In: Proceedings of IEEE international conference on advanced computer theory and engineering (ICACTE 2008), Cairo, Egypt, 2008, pp 1020–1024
Sun Y, Kamel MS, Wong AKC, Wang Y (2007) Cost-sensitive boosting for classification of imbalanced data. Pattern Recognit 40(12):3358–3378
Article MATH Google Scholar
Wang BX, Japkowicz N (2010) Boosting support vector machines for imbalanced data sets. Knowl Inf Syst 25(1):1–20
Article Google Scholar
Zhang Y, Wang D (2013) A cost-sensitive ensemble method for class-imbalanced data sets. Abstract and applied analysis, 2013, Vol 2013, Article ID 196256, 6 pages
Batuwita R, Palade V (2010) FSVM-CIL: fuzzy support vector machines for class imbalance learning. IEEE Trans Fuzzy Syst 18(3):558–571
Article Google Scholar
Cano A, Zafra A, Ventura S (2013) Weighted data gravitation classification for standard and imbalanced data. IEEE Trans Cybern 43(6):1672–1687
Article Google Scholar
Wu G, Chang EY (2003) Class-boundary alignment for imbalanced data set learning. In: ICML 2003 workshop on learning from imbalanced data sets, 2003, pp 49–56
Ertekin S, Huang J, Giles CL (2007) Active learning for class imbalance problem. In: Proceedings of ACM SIGIR’07, pp 823–824
Fu JH, Lee SL (2013) Certainty-based active learning for sampling imbalanced data sets. Neurocomputing 119:350–358
Article Google Scholar
Galar M, Fernández A, Barrenechea E, Bustince H (2012) A review on ensembles for the class imbalance problem bagging-, boosting-, and hybrid-based approaches. IEEE Trans Syst Man Cybern Part C Appl Rev 42(2):463–484
Article Google Scholar
Oh S, Lee MS, Zhang BT (2011) Ensemble learning with active example selection for imbalanced biomedical data classification. IEEE/ACM Trans Comput Biol Bioinf 8(2):316–325
Article Google Scholar
Liu Y, Yu XH, Huang JX, An AJ (2011) Combining integrated sampling with SVM ensembles for learning from imbalanced data sets. Inf Process Manage 47(4):617–631
Article Google Scholar
Yang X, Song Q, Wang Y (2007) A weighted support vector machine for data classification. Int J Pattern Recognit Artif Intell 21(5):961–976
Article Google Scholar
Hwang JP, Park S, Kim E (2011) A new weighted approach to imbalanced data classification problem via support vector machine with quadratic cost function. Expert Syst Appl 38(7):8580–8585
Article Google Scholar
Vapnik V (1995) The nature of statistical learning theory. Springer, New York
Book MATH Google Scholar
Hsu CW, Lin CJ (2002) A comparison of methods for multi-class support vector machines. IEEE Trans Neural Netw 13(2):415–425
Article Google Scholar
Maratea A, Petrosino A, Manzo M (2014) Adjusted F-measure and kernel scaling for imbalanced data learning. Inf Sci 257:331–341
Article Google Scholar
G. Wu, E.Y. Chang. Adaptive feature-space conformal transformation for imbalanced-data learning, In: Proceedings of the Twentieth International Conference on Machine Learning (ICML-2003), Washington, DC, 2003, pp 816–823
Amari S, Wu S (1999) Improving SVM classifiers by modifying kernel functions. Neural Netw 12(6):783–789
Article Google Scholar
Williams P, Li S, Feng J, Wu S (2005) Scaling the kernel function to improve performance of the support vector machine. Lect Notes Comput Sci 3496:251–312
Google Scholar
Alcalá-Fdez J, Fernandez A, Luengo J, Derrac J, García S, Sánchez L, Herrera F (2011) KEEL data-mining software tool: data set repository, integration of algorithms and experimental analysis framework. J Mult Valued Logic Soft Comput 17(2–3):255–287
Google Scholar
Fawcett T (2003) ROC graphs: notes and practical considerations for researchers. HP Labs, Palo Alto, CA, Tech. Rep. HPL-2003-4
Lewis D, Gale W (1998) Training text classifiers by uncertainty sampling. In: Proceedings of the seventeenth annual international ACM SIGIR conference on research and development in information, New York, NY, 1998, pp 73–79
Wu G, Chang EY (2005) KBA: kernel boundary alignment considering imbalanced data distribution. IEEE Trans Knowl Data Eng 17(6):786–795
Article Google Scholar
Wu S, Amari S (2001) Conformal transformation of kernel functions: a data-dependent way to improve support vector machine classifiers. Neural Process Lett 15:59–67
Article Google Scholar
Specht DF (1991) A general regression neural network. IEEE Trans Neural Netw 2(6):568–576
Article Google Scholar
Specht DF (1990) Probabilistic neural networks and the polynomial adaline as complementary techniques for classification. IEEE Trans Neural Netw 1(1):112–116
Article Google Scholar
Huang GB, Zhu QY, Siew CK (2006) Extreme learning machine: theory and applications. Neurocomputing 70(1–3):489–501
Article Google Scholar

Download references

Acknowledgments

This work is partly supported by National Natural Science Foundation of China (No. 61373127), the China Postdoctoral Science Foundation (No. 20110491530), and the University Scientific Research Project of Liaoning Education Department of China (No. 2011186).

Author information

Authors and Affiliations

School of Computer and Information Technology, Liaoning Normal University, No.1, Liushu South Street, Ganjingzi District, Dalian, 116081, Liaoning Province, China
Yong Zhang, Panpan Fu & Wenzhe Liu
College of Information Engineering, Suzhou University, Suzhou, 234000, China
Panpan Fu & Guolong Chen

Authors

Yong Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Panpan Fu
View author publications
You can also search for this author in PubMed Google Scholar
Wenzhe Liu
View author publications
You can also search for this author in PubMed Google Scholar
Guolong Chen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yong Zhang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhang, Y., Fu, P., Liu, W. et al. Imbalanced data classification based on scaling kernel-based support vector machine. Neural Comput & Applic 25, 927–935 (2014). https://doi.org/10.1007/s00521-014-1584-2

Download citation

Received: 20 December 2013
Accepted: 24 March 2014
Published: 05 April 2014
Issue Date: September 2014
DOI: https://doi.org/10.1007/s00521-014-1584-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Imbalanced data classification based on scaling kernel-based support vector machine

Abstract

Access this article

Similar content being viewed by others

Feature dimensionality reduction: a review

Learning from imbalanced data: open challenges and future directions

Supervised Classification Algorithms in Machine Learning: A Survey and Review

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Imbalanced data classification based on scaling kernel-based support vector machine

Abstract

Access this article

Similar content being viewed by others

Feature dimensionality reduction: a review

Learning from imbalanced data: open challenges and future directions

Supervised Classification Algorithms in Machine Learning: A Survey and Review

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation