Skip to main content
Log in

Constraint relaxation, cost-sensitive learning and bagging for imbalanced classification problems with outliers

  • Original Paper
  • Published:
Optimization Letters Aims and scope Submit manuscript

Abstract

Supervised learning consists in developing models able to distinguish data that belong to different categories (classes). When data are available in different proportions the problem becomes imbalanced and the performance of standard classification methods deteriorates significantly. Imbalanced classification becomes even more challenging in the presence of outliers. In this paper, we study several algorithmic modifications of support vector machines classifier for tackling imbalanced problems with outliers. We provide computational evidence that the combined use of cost sensitive learning with constraint relaxation performs better, on average, compared to algorithmic tweaks that involve bagging, a popular approach for dealing with imbalanced problems or outliers separately. The proposed technique is embedded and requires the solution of a single convex optimization problem with no outlier detection preprocessing.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Akhavian, R., Behzadan, A.H.: Construction equipment activity recognition for simulation input modeling using mobile sensors and machine learning classifiers. Adv. Eng. Inform. (2015). doi:10.1016/j.aei.2015.03.001

  2. Anonymous: Matlab, the language of technical computing. World Wide Web. http://www.mathworks.com/products/matlab/

  3. Barnett, V., Lewis, T.: Outliers in Statistical Data, vol. 3. Wiley, New York (1994)

    MATH  Google Scholar 

  4. Breiman, L.: Bagging predictors. Mach. Learn. 24(2), 123–140 (1996)

    MATH  Google Scholar 

  5. Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)

    Article  MATH  Google Scholar 

  6. Cao, J., Kwong, S., Wang, R.: A noise-detection based adaboost algorithm for mislabeled data. Pattern Recognit. 45(12), 4451–4465 (2012)

    Article  MATH  Google Scholar 

  7. Casillas, J., Martínez-López, F.J.: Mining uncertain data with multiobjective genetic fuzzy systems to be applied in consumer behaviour modelling. Expert Syst. Appl. 36(2), 1645–1659 (2009)

    Article  Google Scholar 

  8. Chang, C., Lin, C.: Libsvm: a library for support vector machines. ACM Trans. Intell. Syst. Technol. (TIST) 2(3), 1–27 (2011)

    Article  Google Scholar 

  9. Chang, E.Y., Li, B., Wu, G., Goh, K.: Statistical learning for effective visual information retrieval. In: Proceedings. 2003 International Conference on Image Processing, 2003. ICIP 2003, vol. 3, pp. III–609. IEEE (2003)

  10. Dorronsoro, J.R., Ginel, F., Sgnchez, C., Cruz, C.: Neural fraud detection in credit card operations. IEEE Trans. Neural Netw. 8(4), 827–834 (1997)

    Article  Google Scholar 

  11. Frank, A., Asuncion, A.: UCI machine learning repository. University of California. School of Information and Computer Science. Irvine, CA (2010). http://archive.ics.uci.edu/ml. Accessed 12 Aug 2015

  12. Galar, M., Fernández, A., Barrenechea, E., Bustince, H., Herrera, F.: A review on ensembles for the class imbalance problem: bagging-, boosting-, and hybrid-based approaches. IEEE Transactions on Syst. Man Cybern. C Appl. Rev. 42(4), 463–484 (2012)

    Article  Google Scholar 

  13. Hido, S., Kashima, H., Takahashi, Y.: Roughly balanced bagging for imbalanced data. Stat. Anal. Data Min. ASA Data Sci.J. 2(5–6), 412–426 (2009)

    Article  MathSciNet  Google Scholar 

  14. Hodge, V., Austin, J.: A survey of outlier detection methodologies. Artif. Intell. Rev. 22(2), 85–126 (2004)

    Article  MATH  Google Scholar 

  15. Huang, C., Lee, Y., Lin, D., Huang, S.: Model selection for support vector machines via uniform design. Comput. Stat. Data Anal. 52(1), 335–346 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  16. Li, C.: Classifying imbalanced data using a bagging ensemble variation (bev). In: Proceedings of the 45th Annual Southeast Regional Conference, pp. 203–208. ACM (2007)

  17. Ling, C.X., Sheng, V.S.: Cost-sensitive learning. In: Sammut, C., Webb, G.I. (eds.) Encyclopedia of Machine Learning, pp. 231–235. Springer, Berlin (2010)

  18. Mangasarian, O.L., Wild, E.W.: Privacy-preserving classification of horizontally partitioned data via random kernels. In: Proceedings of the 2008 International Conference on Data Mining, DMIN08, vol. 2, pp. 473–479 (2007)

  19. Mazurowski, M.A., Habas, P.A., Zurada, J.M., Lo, J.Y., Baker, J.A., Tourassi, G.D.: Training neural network classifiers for medical decision making: the effects of imbalanced datasets on classification performance. Neural Netw. 21(2), 427–436 (2008)

    Article  Google Scholar 

  20. Melville, P., Shah, N., Mihalkova, L., Mooney, R.J.: Experiments on ensembles with missing and noisy data. In: Roli, F., Kittler, J., Windeatt, T. (eds.) Multiple Classifier Systems, pp. 293–302. Springer, Berlin (2004)

  21. Padmaja, T., Dhulipalla, N., Bapi, R., Krishna, P.: Unbalanced data classification using extreme outlier elimination and sampling techniques for fraud detection. In: International Conference on Advanced Computing and Communications, 2007. ADCOM 2007, pp. 511–516. IEEE (2007)

  22. Pechenizkiy, M., Tsymbal, A., Puuronen, S., Pechenizkiy, O.: Class noise and supervised learning in medical domains: the effect of feature extraction. In: 19th IEEE International Symposium on Computer-Based Medical Systems, 2006. CBMS 2006, pp. 708–713. IEEE (2006)

  23. Phua, C., Alahakoon, D., Lee, V.: Minority report in fraud detection: classification of skewed data. ACM SIGKDD Explor. Newsl. 6(1), 50–59 (2004)

    Article  Google Scholar 

  24. Şeref, O., Razzaghi, T., Xanthopoulos, P.: Weighted relaxed support vector machines. Ann. Oper. Res. 1–37 (2014). doi:10.1007/s10479-014-1711-6

  25. Sun, Y., Wong, A., Kamel, M.: Classification of imbalanced data: a review. Int. J. Pattern Recognit. Artif. Intell. 23(4), 687–719 (2009)

    Article  Google Scholar 

  26. Tang, Y., Krasser, S., Judge, P., Zhang, Y.: Fast and effective spam sender detection with granular svm on highly imbalanced mail server behavior data. In: Collaborative Computing: Networking, Applications and Worksharing, pp. 1–6. IEEE (2006)

  27. Xanthopoulos, P., Razzaghi, T.: A weighted support vector machine method for control chart pattern recognition. Comput. Ind. Eng. 70, 134–149 (2014)

    Article  Google Scholar 

  28. Zhong, S., Tang, W., Khoshgoftaar, T.M.: Boosted noise filters for identifying mislabeled data. Technical report, Department of Computer Science and Engineering, Florida Atlantic University (2005)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Petros Xanthopoulos.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Razzaghi, T., Xanthopoulos, P. & Şeref, O. Constraint relaxation, cost-sensitive learning and bagging for imbalanced classification problems with outliers. Optim Lett 11, 915–928 (2017). https://doi.org/10.1007/s11590-015-0934-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11590-015-0934-z

Keywords

Navigation