Improving SVM Classification on Imbalanced Datasets by Introducing a New Bias

Núñez, Haydemar; Gonzalez-Abril, Luis; Angulo, Cecilio

doi:10.1007/s00357-017-9242-x

Improving SVM Classification on Imbalanced Datasets by Introducing a New Bias

Published: 14 October 2017

Volume 34, pages 427–443, (2017)
Cite this article

Journal of Classification Aims and scope Submit manuscript

Haydemar Núñez¹,
Luis Gonzalez-Abril² &
Cecilio Angulo³

528 Accesses
27 Citations
Explore all metrics

Abstract

Support Vector Machine (SVM) learning from imbalanced datasets, as well as most learning machines, can show poor performance on the minority class because SVMs were designed to induce a model based on the overall error. To improve their performance in these kind of problems, a low-cost post-processing strategy is proposed based on calculating a new bias to adjust the function learned by the SVM.

The proposed bias will consider the proportional size between classes in order to improve performance on the minority class. This solution avoids not only introducing and tuning new parameters, but also modifying the standard optimization problem for SVM training.

Experimental results on 34 datasets, with different degrees of imbalance, show that the proposed method actually improves the classification on imbalanced datasets, by using standardized error measures based on sensitivity and g-means. Furthermore, its performance is comparable to well-known cost-sensitive and Synthetic Minority Over-sampling Technique (SMOTE) schemes, without adding complexity or computational costs.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

AKBANI, R., KWEK, S., and JAPKOWICZ, N. (2004), “Applying Support Vector Machines to Imbalanced Datasets”, in Proceedings of 15th European Conference on Machine Learning ECML’2004, pp. 39–50.
BATUWITA, R., and PALADE, V. (2010), “FSVM-CIL: Fuzzy Support Vector Machines for Class Imbalance Learning”, IEEE Transactions on Fuzzy Systems, 18, 558–571.
Article Google Scholar
BATUWITA, R., and PALADE, V. (2013), “Class Imbalance Learning Methods for Support Vector Machines”, in Imbalanced Learning: Foundations, Algorithms, and Applications, pp. 83-99, Berlin, Germany: John Wiley and Sons.
CASTRO, C.L., CARVALHO, M.A., and BRAGA, A.P. (2009), “An Improved Algorithm for SVMs Classification of Imbalanced Data Sets”, in Proceedings of 11th International Conference on Enginnering Applications of Neural Networks EANN 2009, pp. 108–118.
CHAWLA, N.V., BOWYER, K.W., HALL, L.O., and KEGELMEYER, W.P. (2002), “SMOTE: Synthetic Minority Over-Sampling Technique”, Journal of Artificial Intelligence Research, 16, 321–357.
MATH Google Scholar
COHEN, G., HILARIO, M., SAX, H., HUGONNET, S., and GEISSBUHLER, A. (2006), “Learning from Imbalanced Data in Surveillance of Nosocomial Infection”, Artificial Intelligence in Medicine, 37, 7–18.
Article Google Scholar
CRISTIANINI, N., and SHAWE-TAYLOR, J. (2000), An Introduction to Support Vector Machines and Other Kernel-based Learning Methods (1st ed.), New York, NY: Cambridge University Press.
DEMSER, J. (2006), “Statistical Comparisons of Classifiers over Multiple Data Sets, Journal of Machine Learning Research, 7, 1–30.
MathSciNet Google Scholar
ERTEKIN, S. (2013), “Adaptive Oversampling for Imbalanced Data Classification”, in Information Sciences and Systems, Lecture Notes in Electrical Engineering, 264, 261–269.
FRANK, A., and ASUNCION, A. (2010), UCI “Machine Learning Repository”, University of California, School of Information and Computer Science, Irvine, http://archive.ics.uci.edu/ml.
GONZALEZ-ABRIL, L., NÚÑEZ, H., ANGULO, C., and VELASCO, F. (2014), “GSVM: An SVM for Handling Imbalanced Accuracy Between Classes in Bi-Classification Problems”, Applied Soft Computing, 17, 23-31.
Article Google Scholar
GONZALEZ-ABRIL, L., ANGULO, C., VELASCO, F., and ORTEGA, J.A. (2008), “A Note on the Bias in SVMs for Multiclassification”, IEEE Transactions on Neural Networks, 19(4), 723–725.
Article Google Scholar
HE, H., and GARCIA, E.A. (2009), “Learning from Imbalanced Data”, IEEE Transactions on Knowledge and Data Engineering, 21(9), 1263–1284.
Article Google Scholar
HE, H., and GHODSI, A. (2010), “Rare Class Classification by Support Vector Machine”, in Proceedings 20th International Conference on Pattern Recognition, ICPR’10, pp. 548–551.
HERNÁNDEZ-SANTIAGO, J., CERVANTES, J., CHAU, A.L., and GARCÍA-LAMONT, F. (2012), “Enhancing the Performance of SVM on Skewed Data Sets by Exciting Support Vectors”, in Proceedings of 13th Ibero-American Conference on Artificial Intelligence IBERAMIA 2012, pp. 101–110.
IMAM, T., TING, K.M., and KAMRUZZAMAN, J. (2006), “z-SVM: An SVM for Improved Classification of Imbalanced Data”, in Proceedings of 19th Australian Conference on Artificial Intelligence AUS-AI 2006, pp. 264–273.
KANG, P., and CHO, S. (2006), “EUS SVMs: Ensemble of Under-Sampled SVMs for Data Imbalance Problems”, Lecture Notes in Computer Science, 4232, 837–846.
Article Google Scholar
LI, B., HU, J., and HIRASAWA, K. (2008), “An Improved Support Vector Machine with Soft Decision-Making Boundary”, in Proceedings of 26th IASTED International Conference on Artificial Intelligence and Applications AIA’08, pp. 40–45.
LI, P., YU, X., BI, T.T., and HUANG, J.L. (2014), “Imbalanced Data SVM Classification Method Based on Cluster Boundary Sampling and DT-KNN Pruning”, International Journal of Signal Processing, Image Processing and Pattern Recognition, 7(2), 61-68.
Article Google Scholar
LIU, Y., AN, A., and HUANG, X. (2006), “Boosting Prediction Accuracy on Imbalanced Datasets with SVM Ensembles”, in Proceedings of 10th Pacific-Asia Conference on Knowledge Discovery and Data Mining PAKDD 2006, pp. 107–118.
LÓPEZ, V., FERNÁNDEZ, A., GARCÍA, S., PALADE, V., and HERRERA, F. (2013), “An Insight Into Classification with Imbalanced Data: Empirical Results and Current Trends on Using Data Intrinsic Characteristics”, Information Sciences, 250, 113–141.
Article Google Scholar
MUSCAT, R., MAHFOUF, M., ZUGHRAT, A., YANG, Y.Y., THORNTON, S., KHONDABI, A.V., and SORTANOS, S. (2014), “Hierarchical Fuzzy Support Vector Machine (SVM) for Rail Data Classification”, in Proceedings of 19th IFAC World Congress, pp. 10652–10657.
NGUYEN, H.M., COOPER, E.W., and KAMEI, K. (2011), “Borderline Over-sampling for Imbalanced Data Classification”, International Journal of Knowledge Engineering and Soft Data Paradigms, 3, 4–21.
Article Google Scholar
NÚÑEZ, H., GONZALEZ-ABRIL, L., and ANGULO, C. (2011), “A Post-Processing Strategy for SVM Learning from Unbalanced Data”, in Proceedings 19th European Symposium on Artificial Neural Networks ESANN’2011, pp. 195–200.
ONETO, L., RIDELLA, S., and ANGUITA, D. (2016). “Tikhonov, Ivanov and Morozov Regularizationfor Support Vector Machine Learning, Machine Learning, 3, 103136.
MATH Google Scholar
RAMÍREZ, F., and ALLENDE, H. (2012), “Dual Support Vector Domain Description for Imbalanced Classification”, in Artificial Neural Networks and Machine Learning ICANN 2012, Lecture Notes in Computer Science, 7552, 710–717.
SHANAHAN, J.G., and ROMA, N. (2003), “Improving SVM Text Classification Performance Through Threshold Adjustment”, Lecture Notes in Computer Science, 2837, pp. 361–372.
Article Google Scholar
SUKHANOV, S., MERENTITIS, A., DEBES, C., HAHN, J., and ZOUBIR, A. (2015), “Bootstrap-Based SVM Aggregation for Class Imbalance Problems”, in Proceedings of 23rd European Signal Processing Conference EUSIPCO 2015, pp 155–169.
SUN, A., LIM, E.-P., and LIU, Y. (2009), “On Strategies for Imbalanced Text Classification Using SVM: A Comparative Study”, Decision Support Systems, 48, 191–201.
Article Google Scholar
SUN, Y., WONG, A.C., and KAMEL, M.S. (2009), “Classification of Imbalanced Data: A Review”, International Journal of Pattern Recognition and Artificial Intelligence, 23, 687–719.
Article Google Scholar
TANG, Y., ZHANG, Y.-Q., CHAWLA, N.V., and KRASSER, S. (2009), “SVMs Modeling for Highly Imbalanced Classification”, IEEE Transactions on Systems, Man and Cybernetics–Part B, 39, 281–288.
Article Google Scholar
VAPNIK, V.N. (1999), The Nature of Statistical Learning Theory (Information Science and Statistics), NewYork, NY: Springer.
Google Scholar
VEROPOULOS, K., CAMPBELL, C., and CRISTIANINI, N. (1999), “Controlling the Sensitivity of Support Vector Machines”, in Proceedings of 16th International Joint Conference on Artificial Intelligence IJCAI 1999, pp. 55–60.
VILARIÑO, F., SPYRIDONOS, P., VITRIÀ, J., and RADEVA, P. (2005), “Experiments with SVM and Stratified Sampling with an Imbalanced Problem: Detection of Intestinal Contractions”, in Proceedings of 3rd International Conference on Advanced Pattern Recognition ICAPR 2005, Vol. 2, pp. 783–791.
WANG, B.X., and JAPKOWICZ, N. (2010), “Boosting Support Vector Machines for Imbalanced Data Sets, Knowledge Information Systems, 25, 1–20.
Article Google Scholar
WANG, H., and ZHENG, H. (2008), “An Improved Support Vector Machine for the Classification of Imbalanced Biological Datasets”, in Proceedings of 4th International Conference on Intelligent Computation ICIC 2008, pp. 63–70.
WANG, H.-Y. (2008), “Combination Approach of SMOTE and Biased-SVM for Imbalanced Datasets”, in Proceedings of International Joint Conference on Neural Networks IJCNN 2008, pp. 228–231.
WANG, Q. (2014), “A Hybrid Sampling SVM Approach to Imbalanced Data Classification”, Abstract and Applied Analysis, Article ID 972786.
WASKE, B., BENEDIKTSSON, J.A., and SVEINSSON, J.R. (2009), “Classifying Remote Sensing Data with Support Vector Machines and Imbalanced Training Data”, in Proceedings of 8th International Workshop on Multiple Classifier Systems MCS09, pp. 375–384.
WU, G., and CHANG, E.Y. (2005), “KBA: Kernel Boundary Alignment Considering Imbalanced Data Distribution”, IEEE Transactions on Knowledge and Data Engineering, 17, 786–795.
Article Google Scholar
YANG, C.-Y., WANG, J., YANG, J.-S., and YU, G.-D. (2008), “Imbalanced SVM Learning with Margin Compensation”, in Proceedings of 5th International Symposium on Neural Networks: Advances in Neural Networks ISNN’08, pp. 636–644.
YANG, P., ZHANG, Z., ZHOU, B.B., and ZOMAYA, A.Y. (2011), “Sample Subset Optimization for Classifying Imbalanced Biological Data”, in Proceedings of 15th Pacific-Asia Conference on Advanced Knowledge Discovery and Data Mining PAKDD 2011, Vol. 2, pp. 333–344.
YU, T., DEBENHAM, J., JAN, T., and SIMOFF, S. (2006), “Combine Vector Quantization and Support Vector Machine for Imbalanced Datasets”, in Artificial Intelligence in Theory and Practice, IFIP 19th World Computer Congress, Vol. 217, Chap. 9, pp. 81–88.
ZHOU, B., HA, M., and WANG, C. (2010), “An Improved Algorithm of Unbalanced Data SVM”, Advances in Intelligent and Soft Computing, Fuzzy Information and Engineering, 78, pp. 549-555.
Article MATH Google Scholar
ZIȨBA, M., TOMCZAK, J.M., LUBICZ, M., and ŚWIĄTEK, J. (2014), “Boosted SVM for Extracting Rules from Imbalanced Data in Application to Prediction of the Post-operative Life Expectancy in the Lung Cancer Patients”, Applied Soft Computing, 14(Part A), 99-108.

Download references

Author information

Authors and Affiliations

Escuela de Computación, Facultad de Ciencias, Universidad Central de Venezuela, Paseo Los Ilustres, Caracas, 1040, Venezuela
Haydemar Núñez
Universidad de Sevilla, Sevilla, Spain
Luis Gonzalez-Abril
Technical University of Catalonia, Barcelona, Spain
Cecilio Angulo

Authors

Haydemar Núñez
View author publications
You can also search for this author in PubMed Google Scholar
Luis Gonzalez-Abril
View author publications
You can also search for this author in PubMed Google Scholar
Cecilio Angulo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Haydemar Núñez.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Núñez, H., Gonzalez-Abril, L. & Angulo, C. Improving SVM Classification on Imbalanced Datasets by Introducing a New Bias. J Classif 34, 427–443 (2017). https://doi.org/10.1007/s00357-017-9242-x

Download citation

Published: 14 October 2017
Issue Date: October 2017
DOI: https://doi.org/10.1007/s00357-017-9242-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Improving SVM Classification on Imbalanced Datasets by Introducing a New Bias

Abstract

Access this article

Similar content being viewed by others

Learning Biased SVM with Weighted Within-Class Scatter for Imbalanced Classification

Weighted relaxed support vector machines

An Optimized Cost-Sensitive SVM for Imbalanced Data Learning

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Improving SVM Classification on Imbalanced Datasets by Introducing a New Bias

Abstract

Access this article

Similar content being viewed by others

Learning Biased SVM with Weighted Within-Class Scatter for Imbalanced Classification

Weighted relaxed support vector machines

An Optimized Cost-Sensitive SVM for Imbalanced Data Learning

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation