Majority-Class Aware Support Vector Domain Oversampling for Imbalanced Classification Problems

  • Markus Kächele
  • Patrick Thiam
  • Günther Palm
  • Friedhelm Schwenker
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8774)


In this work, a method is presented to overcome the difficulties posed by imbalanced classification problems. The proposed algorithm fits a data description to the minority class but in contrast to many other algorithms, awareness of samples of the majority class is used to improve the estimation process. The majority samples are incorporated in the optimization procedure and the resulting domain descriptions are generally superior to those without knowledge about the majority class. Extensive experimental results support the validity of this approach.


Imbalanced classification One-class SVM Kernel methods 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Akbani, R., Kwek, S.S., Japkowicz, N.: Applying support vector machines to imbalanced datasets. In: Boulicaut, J.-F., Esposito, F., Giannotti, F., Pedreschi, D. (eds.) ECML 2004. LNCS (LNAI), vol. 3201, pp. 39–50. Springer, Heidelberg (2004)Google Scholar
  2. 2.
    Blum, A., Mitchell, T.: Combining Labeled and Unlabeled Data with Co-training. In: COLT: Proceedings of the Workshop on Computational Learning Theory. Morgan Kaufmann Publishers (1998)Google Scholar
  3. 3.
    Boser, B.E., Guyon, I.M., Vapnik, V.N.: A training algorithm for optimal margin classifiers. In: Proceedings of the 5th Annual Workshop on Computational Learning Theory, COLT 1992, pp. 144–152. ACM (1992)Google Scholar
  4. 4.
    Breiman, L.: Random forests. Machine Learning 45(1), 5–32 (2001)CrossRefMATHGoogle Scholar
  5. 5.
    Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: Smote: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research 16, 321–357 (2002)MATHGoogle Scholar
  6. 6.
    Chawla, N.V., Lazarevic, A., Hall, L.O., Bowyer, K.W.: SMOTEBoost: Improving prediction of the minority class in boosting. Proceedings of the Principles of Knowledge Discovery in Databases (PKDD), 107–119 (2003)Google Scholar
  7. 7.
    Cohen, G., Hilario, M., Sax, H., Hugonnet, S.: Data imbalance in surveillance of nosocomial infections. In: Perner, P., Brause, R., Holzhütter, H.-G. (eds.) ISMDA 2003. LNCS, vol. 2868, pp. 109–117. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  8. 8.
    Guo, L., Boukir, S., Chehata, N.: Support vectors selection for supervised learning using an ensemble approach. In: Proceedings of the International Conference on Pattern Recognition (ICPR), pp. 37–40 (August 2010)Google Scholar
  9. 9.
    Han, H., Wang, W.-Y., Mao, B.-H.: Borderline-SMOTE: A new over-sampling method in imbalanced data sets learning. In: Huang, D.-S., Zhang, X.-P., Huang, G.-B. (eds.) ICIC 2005. LNCS, vol. 3644, pp. 878–887. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  10. 10.
    Hong, X., Chen, S., Harris, C.: A kernel-based two-class classifier for imbalanced data sets. IEEE Transactions on Neural Networks 18(1), 28–41 (2007)CrossRefGoogle Scholar
  11. 11.
    Kächele, M., Glodek, M., Zharkov, D., Meudt, S., Schwenker, F.: Fusion of audio-visual features using hierarchical classifier systems for the recognition of affective states and the state of depression. In: De Marsico, M., Tabbone, A., Fred, A. (eds.) Proceedings of the International Conference on Pattern Recognition Applications and Methods (ICPRAM), pp. 671–678. SciTePress (2014)Google Scholar
  12. 12.
    Kächele, M., Schwenker, F.: Cascaded fusion of dynamic, spatial, and textural feature sets for person-independent facial emotion recognition. In: Proceedings of the International Conference on Pattern Recognition (ICPR) (to appear, 2014)Google Scholar
  13. 13.
    Li, M., Chen, F., Kou, J.: Candidate vectors selection for training support vector machines. In: Third International Conference on Natural Computation, ICNC 2007, vol. 1, pp. 538–542 (August 2007)Google Scholar
  14. 14.
    Raskutti, B., Kowalczyk, A.: Extreme re-balancing for svms: A case study. SIGKDD Explor. Newsl. 6(1), 60–69 (2004)CrossRefGoogle Scholar
  15. 15.
    Schels, M., Scherer, S., Glodek, M., Kestler, H., Palm, G., Schwenker, F.: On the discovery of events in EEG data utilizing information fusion. Computational Statistics 28(1), 5–18 (2013)MathSciNetCrossRefGoogle Scholar
  16. 16.
    Schölkopf, B., Platt, J.C., Shawe-Taylor, J.C., Smola, A.J., Williamson, R.C.: Estimating the support of a high-dimensional distribution. Neural Computation 13(7), 1443–1471 (2001)CrossRefMATHGoogle Scholar
  17. 17.
    Tax, D.M.J., Duin, R.P.W.: Support vector domain description. Pattern Recognition Letters 20, 1191–1199 (1999)CrossRefGoogle Scholar
  18. 18.
    Van Hulse, J., Khoshgoftaar, T.M., Napolitano, A.: Experimental perspectives on learning from imbalanced data. In: Proceedings of the International Conference on Machine Learning, ICML 2007, pp. 935–942. ACM, New York (2007)Google Scholar
  19. 19.
    Vapnik, V.N.: Statistical Learning Theory, vol. 2. Wiley (1998)Google Scholar
  20. 20.
    Zeng, Z.-Q., Gao, J.: Improving SVM classification with imbalance data set. In: Leung, C.S., Lee, M., Chan, J.H. (eds.) ICONIP 2009, Part I. LNCS, vol. 5863, pp. 389–398. Springer, Heidelberg (2009)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Markus Kächele
    • 1
  • Patrick Thiam
    • 1
  • Günther Palm
    • 1
  • Friedhelm Schwenker
    • 1
  1. 1.Institute of Neural Information ProcessingUlm University, James-Franck-RingUlmGermany

Personalised recommendations