Skip to main content
Log in

Improving SVM Classification on Imbalanced Datasets by Introducing a New Bias

  • Published:
Journal of Classification Aims and scope Submit manuscript

Abstract

Support Vector Machine (SVM) learning from imbalanced datasets, as well as most learning machines, can show poor performance on the minority class because SVMs were designed to induce a model based on the overall error. To improve their performance in these kind of problems, a low-cost post-processing strategy is proposed based on calculating a new bias to adjust the function learned by the SVM.

The proposed bias will consider the proportional size between classes in order to improve performance on the minority class. This solution avoids not only introducing and tuning new parameters, but also modifying the standard optimization problem for SVM training.

Experimental results on 34 datasets, with different degrees of imbalance, show that the proposed method actually improves the classification on imbalanced datasets, by using standardized error measures based on sensitivity and g-means. Furthermore, its performance is comparable to well-known cost-sensitive and Synthetic Minority Over-sampling Technique (SMOTE) schemes, without adding complexity or computational costs.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • AKBANI, R., KWEK, S., and JAPKOWICZ, N. (2004), “Applying Support Vector Machines to Imbalanced Datasets”, in Proceedings of 15th European Conference on Machine Learning ECML’2004, pp. 39–50.

  • BATUWITA, R., and PALADE, V. (2010), “FSVM-CIL: Fuzzy Support Vector Machines for Class Imbalance Learning”, IEEE Transactions on Fuzzy Systems, 18, 558–571.

    Article  Google Scholar 

  • BATUWITA, R., and PALADE, V. (2013), “Class Imbalance Learning Methods for Support Vector Machines”, in Imbalanced Learning: Foundations, Algorithms, and Applications, pp. 83-99, Berlin, Germany: John Wiley and Sons.

  • CASTRO, C.L., CARVALHO, M.A., and BRAGA, A.P. (2009), “An Improved Algorithm for SVMs Classification of Imbalanced Data Sets”, in Proceedings of 11th International Conference on Enginnering Applications of Neural Networks EANN 2009, pp. 108–118.

  • CHAWLA, N.V., BOWYER, K.W., HALL, L.O., and KEGELMEYER, W.P. (2002), “SMOTE: Synthetic Minority Over-Sampling Technique”, Journal of Artificial Intelligence Research, 16, 321–357.

    MATH  Google Scholar 

  • COHEN, G., HILARIO, M., SAX, H., HUGONNET, S., and GEISSBUHLER, A. (2006), “Learning from Imbalanced Data in Surveillance of Nosocomial Infection”, Artificial Intelligence in Medicine, 37, 7–18.

    Article  Google Scholar 

  • CRISTIANINI, N., and SHAWE-TAYLOR, J. (2000), An Introduction to Support Vector Machines and Other Kernel-based Learning Methods (1st ed.), New York, NY: Cambridge University Press.

  • DEMSER, J. (2006), “Statistical Comparisons of Classifiers over Multiple Data Sets, Journal of Machine Learning Research, 7, 1–30.

    MathSciNet  Google Scholar 

  • ERTEKIN, S. (2013), “Adaptive Oversampling for Imbalanced Data Classification”, in Information Sciences and Systems, Lecture Notes in Electrical Engineering, 264, 261–269.

  • FRANK, A., and ASUNCION, A. (2010), UCI “Machine Learning Repository”, University of California, School of Information and Computer Science, Irvine, http://archive.ics.uci.edu/ml.

  • GONZALEZ-ABRIL, L., NÚÑEZ, H., ANGULO, C., and VELASCO, F. (2014), “GSVM: An SVM for Handling Imbalanced Accuracy Between Classes in Bi-Classification Problems”, Applied Soft Computing, 17, 23-31.

    Article  Google Scholar 

  • GONZALEZ-ABRIL, L., ANGULO, C., VELASCO, F., and ORTEGA, J.A. (2008), “A Note on the Bias in SVMs for Multiclassification”, IEEE Transactions on Neural Networks, 19(4), 723–725.

    Article  Google Scholar 

  • HE, H., and GARCIA, E.A. (2009), “Learning from Imbalanced Data”, IEEE Transactions on Knowledge and Data Engineering, 21(9), 1263–1284.

    Article  Google Scholar 

  • HE, H., and GHODSI, A. (2010), “Rare Class Classification by Support Vector Machine”, in Proceedings 20th International Conference on Pattern Recognition, ICPR’10, pp. 548–551.

  • HERNÁNDEZ-SANTIAGO, J., CERVANTES, J., CHAU, A.L., and GARCÍA-LAMONT, F. (2012), “Enhancing the Performance of SVM on Skewed Data Sets by Exciting Support Vectors”, in Proceedings of 13th Ibero-American Conference on Artificial Intelligence IBERAMIA 2012, pp. 101–110.

  • IMAM, T., TING, K.M., and KAMRUZZAMAN, J. (2006), “z-SVM: An SVM for Improved Classification of Imbalanced Data”, in Proceedings of 19th Australian Conference on Artificial Intelligence AUS-AI 2006, pp. 264–273.

  • KANG, P., and CHO, S. (2006), “EUS SVMs: Ensemble of Under-Sampled SVMs for Data Imbalance Problems”, Lecture Notes in Computer Science, 4232, 837–846.

    Article  Google Scholar 

  • LI, B., HU, J., and HIRASAWA, K. (2008), “An Improved Support Vector Machine with Soft Decision-Making Boundary”, in Proceedings of 26th IASTED International Conference on Artificial Intelligence and Applications AIA’08, pp. 40–45.

  • LI, P., YU, X., BI, T.T., and HUANG, J.L. (2014), “Imbalanced Data SVM Classification Method Based on Cluster Boundary Sampling and DT-KNN Pruning”, International Journal of Signal Processing, Image Processing and Pattern Recognition, 7(2), 61-68.

    Article  Google Scholar 

  • LIU, Y., AN, A., and HUANG, X. (2006), “Boosting Prediction Accuracy on Imbalanced Datasets with SVM Ensembles”, in Proceedings of 10th Pacific-Asia Conference on Knowledge Discovery and Data Mining PAKDD 2006, pp. 107–118.

  • LÓPEZ, V., FERNÁNDEZ, A., GARCÍA, S., PALADE, V., and HERRERA, F. (2013), “An Insight Into Classification with Imbalanced Data: Empirical Results and Current Trends on Using Data Intrinsic Characteristics”, Information Sciences, 250, 113–141.

    Article  Google Scholar 

  • MUSCAT, R., MAHFOUF, M., ZUGHRAT, A., YANG, Y.Y., THORNTON, S., KHONDABI, A.V., and SORTANOS, S. (2014), “Hierarchical Fuzzy Support Vector Machine (SVM) for Rail Data Classification”, in Proceedings of 19th IFAC World Congress, pp. 10652–10657.

  • NGUYEN, H.M., COOPER, E.W., and KAMEI, K. (2011), “Borderline Over-sampling for Imbalanced Data Classification”, International Journal of Knowledge Engineering and Soft Data Paradigms, 3, 4–21.

    Article  Google Scholar 

  • NÚÑEZ, H., GONZALEZ-ABRIL, L., and ANGULO, C. (2011), “A Post-Processing Strategy for SVM Learning from Unbalanced Data”, in Proceedings 19th European Symposium on Artificial Neural Networks ESANN’2011, pp. 195–200.

  • ONETO, L., RIDELLA, S., and ANGUITA, D. (2016). “Tikhonov, Ivanov and Morozov Regularizationfor Support Vector Machine Learning, Machine Learning, 3, 103136.

    MATH  Google Scholar 

  • RAMÍREZ, F., and ALLENDE, H. (2012), “Dual Support Vector Domain Description for Imbalanced Classification”, in Artificial Neural Networks and Machine Learning ICANN 2012, Lecture Notes in Computer Science, 7552, 710–717.

  • SHANAHAN, J.G., and ROMA, N. (2003), “Improving SVM Text Classification Performance Through Threshold Adjustment”, Lecture Notes in Computer Science, 2837, pp. 361–372.

    Article  Google Scholar 

  • SUKHANOV, S., MERENTITIS, A., DEBES, C., HAHN, J., and ZOUBIR, A. (2015), “Bootstrap-Based SVM Aggregation for Class Imbalance Problems”, in Proceedings of 23rd European Signal Processing Conference EUSIPCO 2015, pp 155–169.

  • SUN, A., LIM, E.-P., and LIU, Y. (2009), “On Strategies for Imbalanced Text Classification Using SVM: A Comparative Study”, Decision Support Systems, 48, 191–201.

    Article  Google Scholar 

  • SUN, Y., WONG, A.C., and KAMEL, M.S. (2009), “Classification of Imbalanced Data: A Review”, International Journal of Pattern Recognition and Artificial Intelligence, 23, 687–719.

    Article  Google Scholar 

  • TANG, Y., ZHANG, Y.-Q., CHAWLA, N.V., and KRASSER, S. (2009), “SVMs Modeling for Highly Imbalanced Classification”, IEEE Transactions on Systems, Man and Cybernetics–Part B, 39, 281–288.

    Article  Google Scholar 

  • VAPNIK, V.N. (1999), The Nature of Statistical Learning Theory (Information Science and Statistics), NewYork, NY: Springer.

    Google Scholar 

  • VEROPOULOS, K., CAMPBELL, C., and CRISTIANINI, N. (1999), “Controlling the Sensitivity of Support Vector Machines”, in Proceedings of 16th International Joint Conference on Artificial Intelligence IJCAI 1999, pp. 55–60.

  • VILARIÑO, F., SPYRIDONOS, P., VITRIÀ, J., and RADEVA, P. (2005), “Experiments with SVM and Stratified Sampling with an Imbalanced Problem: Detection of Intestinal Contractions”, in Proceedings of 3rd International Conference on Advanced Pattern Recognition ICAPR 2005, Vol. 2, pp. 783–791.

  • WANG, B.X., and JAPKOWICZ, N. (2010), “Boosting Support Vector Machines for Imbalanced Data Sets, Knowledge Information Systems, 25, 1–20.

    Article  Google Scholar 

  • WANG, H., and ZHENG, H. (2008), “An Improved Support Vector Machine for the Classification of Imbalanced Biological Datasets”, in Proceedings of 4th International Conference on Intelligent Computation ICIC 2008, pp. 63–70.

  • WANG, H.-Y. (2008), “Combination Approach of SMOTE and Biased-SVM for Imbalanced Datasets”, in Proceedings of International Joint Conference on Neural Networks IJCNN 2008, pp. 228–231.

  • WANG, Q. (2014), “A Hybrid Sampling SVM Approach to Imbalanced Data Classification”, Abstract and Applied Analysis, Article ID 972786.

  • WASKE, B., BENEDIKTSSON, J.A., and SVEINSSON, J.R. (2009), “Classifying Remote Sensing Data with Support Vector Machines and Imbalanced Training Data”, in Proceedings of 8th International Workshop on Multiple Classifier Systems MCS09, pp. 375–384.

  • WU, G., and CHANG, E.Y. (2005), “KBA: Kernel Boundary Alignment Considering Imbalanced Data Distribution”, IEEE Transactions on Knowledge and Data Engineering, 17, 786–795.

    Article  Google Scholar 

  • YANG, C.-Y., WANG, J., YANG, J.-S., and YU, G.-D. (2008), “Imbalanced SVM Learning with Margin Compensation”, in Proceedings of 5th International Symposium on Neural Networks: Advances in Neural Networks ISNN’08, pp. 636–644.

  • YANG, P., ZHANG, Z., ZHOU, B.B., and ZOMAYA, A.Y. (2011), “Sample Subset Optimization for Classifying Imbalanced Biological Data”, in Proceedings of 15th Pacific-Asia Conference on Advanced Knowledge Discovery and Data Mining PAKDD 2011, Vol. 2, pp. 333–344.

  • YU, T., DEBENHAM, J., JAN, T., and SIMOFF, S. (2006), “Combine Vector Quantization and Support Vector Machine for Imbalanced Datasets”, in Artificial Intelligence in Theory and Practice, IFIP 19th World Computer Congress, Vol. 217, Chap. 9, pp. 81–88.

  • ZHOU, B., HA, M., and WANG, C. (2010), “An Improved Algorithm of Unbalanced Data SVM”, Advances in Intelligent and Soft Computing, Fuzzy Information and Engineering, 78, pp. 549-555.

    Article  MATH  Google Scholar 

  • ZIÈšBA, M., TOMCZAK, J.M., LUBICZ, M., and ƚWIĄTEK, J. (2014), “Boosted SVM for Extracting Rules from Imbalanced Data in Application to Prediction of the Post-operative Life Expectancy in the Lung Cancer Patients”, Applied Soft Computing, 14(Part A), 99-108.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Haydemar NĂșñez.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

NĂșñez, H., Gonzalez-Abril, L. & Angulo, C. Improving SVM Classification on Imbalanced Datasets by Introducing a New Bias. J Classif 34, 427–443 (2017). https://doi.org/10.1007/s00357-017-9242-x

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00357-017-9242-x

Keywords

Navigation