Abstract
Supervised learning consists in developing models able to distinguish data that belong to different categories (classes). When data are available in different proportions the problem becomes imbalanced and the performance of standard classification methods deteriorates significantly. Imbalanced classification becomes even more challenging in the presence of outliers. In this paper, we study several algorithmic modifications of support vector machines classifier for tackling imbalanced problems with outliers. We provide computational evidence that the combined use of cost sensitive learning with constraint relaxation performs better, on average, compared to algorithmic tweaks that involve bagging, a popular approach for dealing with imbalanced problems or outliers separately. The proposed technique is embedded and requires the solution of a single convex optimization problem with no outlier detection preprocessing.
Similar content being viewed by others
References
Akhavian, R., Behzadan, A.H.: Construction equipment activity recognition for simulation input modeling using mobile sensors and machine learning classifiers. Adv. Eng. Inform. (2015). doi:10.1016/j.aei.2015.03.001
Anonymous: Matlab, the language of technical computing. World Wide Web. http://www.mathworks.com/products/matlab/
Barnett, V., Lewis, T.: Outliers in Statistical Data, vol. 3. Wiley, New York (1994)
Breiman, L.: Bagging predictors. Mach. Learn. 24(2), 123–140 (1996)
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
Cao, J., Kwong, S., Wang, R.: A noise-detection based adaboost algorithm for mislabeled data. Pattern Recognit. 45(12), 4451–4465 (2012)
Casillas, J., Martínez-López, F.J.: Mining uncertain data with multiobjective genetic fuzzy systems to be applied in consumer behaviour modelling. Expert Syst. Appl. 36(2), 1645–1659 (2009)
Chang, C., Lin, C.: Libsvm: a library for support vector machines. ACM Trans. Intell. Syst. Technol. (TIST) 2(3), 1–27 (2011)
Chang, E.Y., Li, B., Wu, G., Goh, K.: Statistical learning for effective visual information retrieval. In: Proceedings. 2003 International Conference on Image Processing, 2003. ICIP 2003, vol. 3, pp. III–609. IEEE (2003)
Dorronsoro, J.R., Ginel, F., Sgnchez, C., Cruz, C.: Neural fraud detection in credit card operations. IEEE Trans. Neural Netw. 8(4), 827–834 (1997)
Frank, A., Asuncion, A.: UCI machine learning repository. University of California. School of Information and Computer Science. Irvine, CA (2010). http://archive.ics.uci.edu/ml. Accessed 12 Aug 2015
Galar, M., Fernández, A., Barrenechea, E., Bustince, H., Herrera, F.: A review on ensembles for the class imbalance problem: bagging-, boosting-, and hybrid-based approaches. IEEE Transactions on Syst. Man Cybern. C Appl. Rev. 42(4), 463–484 (2012)
Hido, S., Kashima, H., Takahashi, Y.: Roughly balanced bagging for imbalanced data. Stat. Anal. Data Min. ASA Data Sci.J. 2(5–6), 412–426 (2009)
Hodge, V., Austin, J.: A survey of outlier detection methodologies. Artif. Intell. Rev. 22(2), 85–126 (2004)
Huang, C., Lee, Y., Lin, D., Huang, S.: Model selection for support vector machines via uniform design. Comput. Stat. Data Anal. 52(1), 335–346 (2007)
Li, C.: Classifying imbalanced data using a bagging ensemble variation (bev). In: Proceedings of the 45th Annual Southeast Regional Conference, pp. 203–208. ACM (2007)
Ling, C.X., Sheng, V.S.: Cost-sensitive learning. In: Sammut, C., Webb, G.I. (eds.) Encyclopedia of Machine Learning, pp. 231–235. Springer, Berlin (2010)
Mangasarian, O.L., Wild, E.W.: Privacy-preserving classification of horizontally partitioned data via random kernels. In: Proceedings of the 2008 International Conference on Data Mining, DMIN08, vol. 2, pp. 473–479 (2007)
Mazurowski, M.A., Habas, P.A., Zurada, J.M., Lo, J.Y., Baker, J.A., Tourassi, G.D.: Training neural network classifiers for medical decision making: the effects of imbalanced datasets on classification performance. Neural Netw. 21(2), 427–436 (2008)
Melville, P., Shah, N., Mihalkova, L., Mooney, R.J.: Experiments on ensembles with missing and noisy data. In: Roli, F., Kittler, J., Windeatt, T. (eds.) Multiple Classifier Systems, pp. 293–302. Springer, Berlin (2004)
Padmaja, T., Dhulipalla, N., Bapi, R., Krishna, P.: Unbalanced data classification using extreme outlier elimination and sampling techniques for fraud detection. In: International Conference on Advanced Computing and Communications, 2007. ADCOM 2007, pp. 511–516. IEEE (2007)
Pechenizkiy, M., Tsymbal, A., Puuronen, S., Pechenizkiy, O.: Class noise and supervised learning in medical domains: the effect of feature extraction. In: 19th IEEE International Symposium on Computer-Based Medical Systems, 2006. CBMS 2006, pp. 708–713. IEEE (2006)
Phua, C., Alahakoon, D., Lee, V.: Minority report in fraud detection: classification of skewed data. ACM SIGKDD Explor. Newsl. 6(1), 50–59 (2004)
Şeref, O., Razzaghi, T., Xanthopoulos, P.: Weighted relaxed support vector machines. Ann. Oper. Res. 1–37 (2014). doi:10.1007/s10479-014-1711-6
Sun, Y., Wong, A., Kamel, M.: Classification of imbalanced data: a review. Int. J. Pattern Recognit. Artif. Intell. 23(4), 687–719 (2009)
Tang, Y., Krasser, S., Judge, P., Zhang, Y.: Fast and effective spam sender detection with granular svm on highly imbalanced mail server behavior data. In: Collaborative Computing: Networking, Applications and Worksharing, pp. 1–6. IEEE (2006)
Xanthopoulos, P., Razzaghi, T.: A weighted support vector machine method for control chart pattern recognition. Comput. Ind. Eng. 70, 134–149 (2014)
Zhong, S., Tang, W., Khoshgoftaar, T.M.: Boosted noise filters for identifying mislabeled data. Technical report, Department of Computer Science and Engineering, Florida Atlantic University (2005)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Razzaghi, T., Xanthopoulos, P. & Şeref, O. Constraint relaxation, cost-sensitive learning and bagging for imbalanced classification problems with outliers. Optim Lett 11, 915–928 (2017). https://doi.org/10.1007/s11590-015-0934-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11590-015-0934-z