Skip to main content
Log in

Constructing support vector machine ensemble with segmentation for imbalanced datasets

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

A novel method, namely ensemble support vector machine with segmentation (SeEn–SVM), for the classification of imbalanced datasets is proposed in this paper. In particular, vector quantization algorithm is used to segment the majority class and hence generates some small datasets that are of less imbalance than original one, and two different weighted functions are proposed to integrate all the results of basic classifiers. The goal of the SeEn–SVM algorithm is to improve the prediction accuracy of the minority class, which is more interesting for people. The SeEn–SVM is applied to six UCI datasets, and the results confirmed its better performance than previously proposed methods for imbalance problem.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  1. Chawla NV, Japkowicz N, Kotcz A (2004) Editorial: special issue on learning from imbalanced data sets. SIGKDD Explor 6(1):1–6

    Article  Google Scholar 

  2. Japkowicz N (2000) Learning from imbalanced data sets: a comparison of various strategies. In: Proceedings of the AAAI’2000 workshop on learning from imbalanced data sets, pp 10–15

  3. Chawla NV, Japkowicz N, Kolcz A (Eds.) (2003) In: Proceedings of the ICML’2003 workshop on learning from imbalanced data sets

  4. Chawla NV, Japkowicz N, Zhou ZH (2009) In: PAKDD’2009 workshop: data mining when classes are imbalanced and errors have costs, Thailand

  5. Nguwi YY, Cho SY (2010) An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst Appl 37(12):8303–8312

    Article  Google Scholar 

  6. Tian J, Gu H, Liu WQ (2011) Imbalanced classification using support vector machine ensemble. Neural Comput Appl 20(2):203–209

    Article  Google Scholar 

  7. Kubat M, Matwin S (1997) Addressing the curse of imbalanced training sets: One-sided selection. In:Proceedings of the fourteenth international conference on machine learning, pp 179–186

  8. Chawla NV, Bowyer K, Hall L, Kegelmeyer WP (2002) SMOTE: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357

    MATH  Google Scholar 

  9. Domingos P (1999) MetaCost: a general method for making classifiers cost sensitive. In: Proceedings of the fifth international conference on knowledge discovery and data mining, pp 155–164

  10. Elkan C (2001) The foundations of cost-sensitive learning. In: Proceedings of the seventeenth international joint conference on artificial intelligence. Morgan Kaufmann, San Francisco, pp 973–978

  11. Drummond C, Holte RC (2003) C4.5, class imbalance and cost sensitivity: why under-sampling beats over-sampling. In: Workshop on Learning from Imbalanced Datasets II, held in conjunction with ICML 2003

  12. Manevitz LM, Yousef M (2001) One-class SVMs for document classification. J Mach Leran Res 2(2):139–154

    Google Scholar 

  13. Raskutti B, Kowalczyk A (2003) Extreme re-balancing for SVMs: a case study. In: Workshop on learning from imbalanced data sets II, international conference on machine learning

  14. Cortes C, Vapnik V (1995) Support-vector networks. Machine Learn 20:273–297

    MATH  Google Scholar 

  15. Deng NY, Tian YJ, Zhang CH (2012) Support vector machines: theory, algorithms, and extensions. CRC Press (in press)

  16. Akbani R, Kwek S, Japkowicz N (2004) Applying support vector machines to imbalanced datasets. In: Proceedings of ECML 2004. LNCS (LNAI), 3201, pp 39–50

  17. Yang CY, Wang JJ, Yang JS and Yu GD (2008) Imbalanced SVM learning with margin compensation, In: Proceedings of ISNN 2008, Part I, LNCS 5263, pp 636–644

  18. Benjamin X, Wang, Japkowicz N (2008) Boosting support vector machines for imbalanced data sets. In: Proceedings of ISMIS 2008, LNAI 4994, pp 38–47

  19. Krogh A, Vedelsby J (1995) Neural network ensembles, cross validation, and active learning. Advances in neural information processing systems 7. MIT Press, Cambridge, MA, pp 231–238

    Google Scholar 

  20. Zhou ZH, Wu J, Tang W (2002) Ensembling neural networks: many could be better than all. Artif Intell 137(1–2):239–263

    Article  MathSciNet  MATH  Google Scholar 

  21. Gersho A, Gray RM (1992) Vector quantization and signal compression. Kluwer, Dordrecht

    Book  MATH  Google Scholar 

  22. Yu T, Debenham J, Jan T, Simoff S (2006) Combine vector quantization and support vector machine for imbalanced datasets, In: TFTP international federation for information processing, 2006, pp 217–227

  23. Zhao XM, Wang Y, Chen LN, Kazuyuki A (2008) Gene function prediction using labeled and unlabeled data. BMC Bioinform 9:57–62

    Article  Google Scholar 

  24. Dror G, Sorek R, Shamir R (2005) Accurate identification of alternatively spliced exons using support vector machine. Bioinformatics 21(7):897–901

    Article  Google Scholar 

  25. Ning H, Yang B, Cui J, Jing L (2009) Detection of horizontal gene transfer in bacterial genomes. In: Proceedings of the third international symposium on optimization and systems biology, pp 229–236

  26. Kubat M, Hotle R, Matwin S (1997) Learning when negative examples abound. In: Proceedings of the 9th European conference on machine learning. London: Springer, Heidelberg, 1224, pp 146–153

  27. Hsu C-W, Chang C-C, Lin C-J (2008) A practical guide to support vector classification. http://www.csie.ntu.edu.tw/~cjlin

  28. Liu XY, Wu JX, Zhou ZH (2009) Exploratory undersampling for class-imbalance learning. IEEE Transact Syst Man Cybern Part B Cybern 39(2):539–550

    Article  Google Scholar 

Download references

Acknowledgments

This work is supported by the National Natural Science Foundation of China (No. 10971223, No. 11071252) and Chinese Universities Scientific Fund (2011JS039, 2012YJ130). The authors also gratefully acknowledge the helpful comments and suggestions of the reviewers, which have improved the presentation.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ling Jing.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, Q., Yang, B., Li, Y. et al. Constructing support vector machine ensemble with segmentation for imbalanced datasets. Neural Comput & Applic 22 (Suppl 1), 249–256 (2013). https://doi.org/10.1007/s00521-012-1041-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-012-1041-z

Keywords

Navigation