Abstract
Support Vector Machine (SVM) learning from imbalanced datasets, as well as most learning machines, can show poor performance on the minority class because SVMs were designed to induce a model based on the overall error. To improve their performance in these kind of problems, a low-cost post-processing strategy is proposed based on calculating a new bias to adjust the function learned by the SVM.
The proposed bias will consider the proportional size between classes in order to improve performance on the minority class. This solution avoids not only introducing and tuning new parameters, but also modifying the standard optimization problem for SVM training.
Experimental results on 34 datasets, with different degrees of imbalance, show that the proposed method actually improves the classification on imbalanced datasets, by using standardized error measures based on sensitivity and g-means. Furthermore, its performance is comparable to well-known cost-sensitive and Synthetic Minority Over-sampling Technique (SMOTE) schemes, without adding complexity or computational costs.
Similar content being viewed by others
References
AKBANI, R., KWEK, S., and JAPKOWICZ, N. (2004), âApplying Support Vector Machines to Imbalanced Datasetsâ, in Proceedings of 15th European Conference on Machine Learning ECMLâ2004, pp. 39â50.
BATUWITA, R., and PALADE, V. (2010), âFSVM-CIL: Fuzzy Support Vector Machines for Class Imbalance Learningâ, IEEE Transactions on Fuzzy Systems, 18, 558â571.
BATUWITA, R., and PALADE, V. (2013), âClass Imbalance Learning Methods for Support Vector Machinesâ, in Imbalanced Learning: Foundations, Algorithms, and Applications, pp. 83-99, Berlin, Germany: John Wiley and Sons.
CASTRO, C.L., CARVALHO, M.A., and BRAGA, A.P. (2009), âAn Improved Algorithm for SVMs Classification of Imbalanced Data Setsâ, in Proceedings of 11th International Conference on Enginnering Applications of Neural Networks EANN 2009, pp. 108â118.
CHAWLA, N.V., BOWYER, K.W., HALL, L.O., and KEGELMEYER, W.P. (2002), âSMOTE: Synthetic Minority Over-Sampling Techniqueâ, Journal of Artificial Intelligence Research, 16, 321â357.
COHEN, G., HILARIO, M., SAX, H., HUGONNET, S., and GEISSBUHLER, A. (2006), âLearning from Imbalanced Data in Surveillance of Nosocomial Infectionâ, Artificial Intelligence in Medicine, 37, 7â18.
CRISTIANINI, N., and SHAWE-TAYLOR, J. (2000), An Introduction to Support Vector Machines and Other Kernel-based Learning Methods (1st ed.), New York, NY: Cambridge University Press.
DEMSER, J. (2006), âStatistical Comparisons of Classifiers over Multiple Data Sets, Journal of Machine Learning Research, 7, 1â30.
ERTEKIN, S. (2013), âAdaptive Oversampling for Imbalanced Data Classificationâ, in Information Sciences and Systems, Lecture Notes in Electrical Engineering, 264, 261â269.
FRANK, A., and ASUNCION, A. (2010), UCI âMachine Learning Repositoryâ, University of California, School of Information and Computer Science, Irvine, http://archive.ics.uci.edu/ml.
GONZALEZ-ABRIL, L., NĂĂEZ, H., ANGULO, C., and VELASCO, F. (2014), âGSVM: An SVM for Handling Imbalanced Accuracy Between Classes in Bi-Classification Problemsâ, Applied Soft Computing, 17, 23-31.
GONZALEZ-ABRIL, L., ANGULO, C., VELASCO, F., and ORTEGA, J.A. (2008), âA Note on the Bias in SVMs for Multiclassificationâ, IEEE Transactions on Neural Networks, 19(4), 723â725.
HE, H., and GARCIA, E.A. (2009), âLearning from Imbalanced Dataâ, IEEE Transactions on Knowledge and Data Engineering, 21(9), 1263â1284.
HE, H., and GHODSI, A. (2010), âRare Class Classification by Support Vector Machineâ, in Proceedings 20th International Conference on Pattern Recognition, ICPRâ10, pp. 548â551.
HERNĂNDEZ-SANTIAGO, J., CERVANTES, J., CHAU, A.L., and GARCĂA-LAMONT, F. (2012), âEnhancing the Performance of SVM on Skewed Data Sets by Exciting Support Vectorsâ, in Proceedings of 13th Ibero-American Conference on Artificial Intelligence IBERAMIA 2012, pp. 101â110.
IMAM, T., TING, K.M., and KAMRUZZAMAN, J. (2006), âz-SVM: An SVM for Improved Classification of Imbalanced Dataâ, in Proceedings of 19th Australian Conference on Artificial Intelligence AUS-AI 2006, pp. 264â273.
KANG, P., and CHO, S. (2006), âEUS SVMs: Ensemble of Under-Sampled SVMs for Data Imbalance Problemsâ, Lecture Notes in Computer Science, 4232, 837â846.
LI, B., HU, J., and HIRASAWA, K. (2008), âAn Improved Support Vector Machine with Soft Decision-Making Boundaryâ, in Proceedings of 26th IASTED International Conference on Artificial Intelligence and Applications AIAâ08, pp. 40â45.
LI, P., YU, X., BI, T.T., and HUANG, J.L. (2014), âImbalanced Data SVM Classification Method Based on Cluster Boundary Sampling and DT-KNN Pruningâ, International Journal of Signal Processing, Image Processing and Pattern Recognition, 7(2), 61-68.
LIU, Y., AN, A., and HUANG, X. (2006), âBoosting Prediction Accuracy on Imbalanced Datasets with SVM Ensemblesâ, in Proceedings of 10th Pacific-Asia Conference on Knowledge Discovery and Data Mining PAKDD 2006, pp. 107â118.
LĂPEZ, V., FERNĂNDEZ, A., GARCĂA, S., PALADE, V., and HERRERA, F. (2013), âAn Insight Into Classification with Imbalanced Data: Empirical Results and Current Trends on Using Data Intrinsic Characteristicsâ, Information Sciences, 250, 113â141.
MUSCAT, R., MAHFOUF, M., ZUGHRAT, A., YANG, Y.Y., THORNTON, S., KHONDABI, A.V., and SORTANOS, S. (2014), âHierarchical Fuzzy Support Vector Machine (SVM) for Rail Data Classificationâ, in Proceedings of 19th IFAC World Congress, pp. 10652â10657.
NGUYEN, H.M., COOPER, E.W., and KAMEI, K. (2011), âBorderline Over-sampling for Imbalanced Data Classificationâ, International Journal of Knowledge Engineering and Soft Data Paradigms, 3, 4â21.
NĂĂEZ, H., GONZALEZ-ABRIL, L., and ANGULO, C. (2011), âA Post-Processing Strategy for SVM Learning from Unbalanced Dataâ, in Proceedings 19th European Symposium on Artificial Neural Networks ESANNâ2011, pp. 195â200.
ONETO, L., RIDELLA, S., and ANGUITA, D. (2016). âTikhonov, Ivanov and Morozov Regularizationfor Support Vector Machine Learning, Machine Learning, 3, 103136.
RAMĂREZ, F., and ALLENDE, H. (2012), âDual Support Vector Domain Description for Imbalanced Classificationâ, in Artificial Neural Networks and Machine Learning ICANN 2012, Lecture Notes in Computer Science, 7552, 710â717.
SHANAHAN, J.G., and ROMA, N. (2003), âImproving SVM Text Classification Performance Through Threshold Adjustmentâ, Lecture Notes in Computer Science, 2837, pp. 361â372.
SUKHANOV, S., MERENTITIS, A., DEBES, C., HAHN, J., and ZOUBIR, A. (2015), âBootstrap-Based SVM Aggregation for Class Imbalance Problemsâ, in Proceedings of 23rd European Signal Processing Conference EUSIPCO 2015, pp 155â169.
SUN, A., LIM, E.-P., and LIU, Y. (2009), âOn Strategies for Imbalanced Text Classification Using SVM: A Comparative Studyâ, Decision Support Systems, 48, 191â201.
SUN, Y., WONG, A.C., and KAMEL, M.S. (2009), âClassification of Imbalanced Data: A Reviewâ, International Journal of Pattern Recognition and Artificial Intelligence, 23, 687â719.
TANG, Y., ZHANG, Y.-Q., CHAWLA, N.V., and KRASSER, S. (2009), âSVMs Modeling for Highly Imbalanced Classificationâ, IEEE Transactions on Systems, Man and CyberneticsâPart B, 39, 281â288.
VAPNIK, V.N. (1999), The Nature of Statistical Learning Theory (Information Science and Statistics), NewYork, NY: Springer.
VEROPOULOS, K., CAMPBELL, C., and CRISTIANINI, N. (1999), âControlling the Sensitivity of Support Vector Machinesâ, in Proceedings of 16th International Joint Conference on Artificial Intelligence IJCAI 1999, pp. 55â60.
VILARIĂO, F., SPYRIDONOS, P., VITRIĂ, J., and RADEVA, P. (2005), âExperiments with SVM and Stratified Sampling with an Imbalanced Problem: Detection of Intestinal Contractionsâ, in Proceedings of 3rd International Conference on Advanced Pattern Recognition ICAPR 2005, Vol. 2, pp. 783â791.
WANG, B.X., and JAPKOWICZ, N. (2010), âBoosting Support Vector Machines for Imbalanced Data Sets, Knowledge Information Systems, 25, 1â20.
WANG, H., and ZHENG, H. (2008), âAn Improved Support Vector Machine for the Classification of Imbalanced Biological Datasetsâ, in Proceedings of 4th International Conference on Intelligent Computation ICIC 2008, pp. 63â70.
WANG, H.-Y. (2008), âCombination Approach of SMOTE and Biased-SVM for Imbalanced Datasetsâ, in Proceedings of International Joint Conference on Neural Networks IJCNN 2008, pp. 228â231.
WANG, Q. (2014), âA Hybrid Sampling SVM Approach to Imbalanced Data Classificationâ, Abstract and Applied Analysis, Article ID 972786.
WASKE, B., BENEDIKTSSON, J.A., and SVEINSSON, J.R. (2009), âClassifying Remote Sensing Data with Support Vector Machines and Imbalanced Training Dataâ, in Proceedings of 8th International Workshop on Multiple Classifier Systems MCS09, pp. 375â384.
WU, G., and CHANG, E.Y. (2005), âKBA: Kernel Boundary Alignment Considering Imbalanced Data Distributionâ, IEEE Transactions on Knowledge and Data Engineering, 17, 786â795.
YANG, C.-Y., WANG, J., YANG, J.-S., and YU, G.-D. (2008), âImbalanced SVM Learning with Margin Compensationâ, in Proceedings of 5th International Symposium on Neural Networks: Advances in Neural Networks ISNNâ08, pp. 636â644.
YANG, P., ZHANG, Z., ZHOU, B.B., and ZOMAYA, A.Y. (2011), âSample Subset Optimization for Classifying Imbalanced Biological Dataâ, in Proceedings of 15th Pacific-Asia Conference on Advanced Knowledge Discovery and Data Mining PAKDD 2011, Vol. 2, pp. 333â344.
YU, T., DEBENHAM, J., JAN, T., and SIMOFF, S. (2006), âCombine Vector Quantization and Support Vector Machine for Imbalanced Datasetsâ, in Artificial Intelligence in Theory and Practice, IFIP 19th World Computer Congress, Vol. 217, Chap. 9, pp. 81â88.
ZHOU, B., HA, M., and WANG, C. (2010), âAn Improved Algorithm of Unbalanced Data SVMâ, Advances in Intelligent and Soft Computing, Fuzzy Information and Engineering, 78, pp. 549-555.
ZIÈšBA, M., TOMCZAK, J.M., LUBICZ, M., and ĆWIÄTEK, J. (2014), âBoosted SVM for Extracting Rules from Imbalanced Data in Application to Prediction of the Post-operative Life Expectancy in the Lung Cancer Patientsâ, Applied Soft Computing, 14(Part A), 99-108.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
NĂșñez, H., Gonzalez-Abril, L. & Angulo, C. Improving SVM Classification on Imbalanced Datasets by Introducing a New Bias. J Classif 34, 427â443 (2017). https://doi.org/10.1007/s00357-017-9242-x
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00357-017-9242-x