Abstract
In this work, a method is presented to overcome the difficulties posed by imbalanced classification problems. The proposed algorithm fits a data description to the minority class but in contrast to many other algorithms, awareness of samples of the majority class is used to improve the estimation process. The majority samples are incorporated in the optimization procedure and the resulting domain descriptions are generally superior to those without knowledge about the majority class. Extensive experimental results support the validity of this approach.
Chapter PDF
Similar content being viewed by others
References
Akbani, R., Kwek, S.S., Japkowicz, N.: Applying support vector machines to imbalanced datasets. In: Boulicaut, J.-F., Esposito, F., Giannotti, F., Pedreschi, D. (eds.) ECML 2004. LNCS (LNAI), vol. 3201, pp. 39–50. Springer, Heidelberg (2004)
Blum, A., Mitchell, T.: Combining Labeled and Unlabeled Data with Co-training. In: COLT: Proceedings of the Workshop on Computational Learning Theory. Morgan Kaufmann Publishers (1998)
Boser, B.E., Guyon, I.M., Vapnik, V.N.: A training algorithm for optimal margin classifiers. In: Proceedings of the 5th Annual Workshop on Computational Learning Theory, COLT 1992, pp. 144–152. ACM (1992)
Breiman, L.: Random forests. Machine Learning 45(1), 5–32 (2001)
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: Smote: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research 16, 321–357 (2002)
Chawla, N.V., Lazarevic, A., Hall, L.O., Bowyer, K.W.: SMOTEBoost: Improving prediction of the minority class in boosting. Proceedings of the Principles of Knowledge Discovery in Databases (PKDD), 107–119 (2003)
Cohen, G., Hilario, M., Sax, H., Hugonnet, S.: Data imbalance in surveillance of nosocomial infections. In: Perner, P., Brause, R., Holzhütter, H.-G. (eds.) ISMDA 2003. LNCS, vol. 2868, pp. 109–117. Springer, Heidelberg (2003)
Guo, L., Boukir, S., Chehata, N.: Support vectors selection for supervised learning using an ensemble approach. In: Proceedings of the International Conference on Pattern Recognition (ICPR), pp. 37–40 (August 2010)
Han, H., Wang, W.-Y., Mao, B.-H.: Borderline-SMOTE: A new over-sampling method in imbalanced data sets learning. In: Huang, D.-S., Zhang, X.-P., Huang, G.-B. (eds.) ICIC 2005. LNCS, vol. 3644, pp. 878–887. Springer, Heidelberg (2005)
Hong, X., Chen, S., Harris, C.: A kernel-based two-class classifier for imbalanced data sets. IEEE Transactions on Neural Networks 18(1), 28–41 (2007)
Kächele, M., Glodek, M., Zharkov, D., Meudt, S., Schwenker, F.: Fusion of audio-visual features using hierarchical classifier systems for the recognition of affective states and the state of depression. In: De Marsico, M., Tabbone, A., Fred, A. (eds.) Proceedings of the International Conference on Pattern Recognition Applications and Methods (ICPRAM), pp. 671–678. SciTePress (2014)
Kächele, M., Schwenker, F.: Cascaded fusion of dynamic, spatial, and textural feature sets for person-independent facial emotion recognition. In: Proceedings of the International Conference on Pattern Recognition (ICPR) (to appear, 2014)
Li, M., Chen, F., Kou, J.: Candidate vectors selection for training support vector machines. In: Third International Conference on Natural Computation, ICNC 2007, vol. 1, pp. 538–542 (August 2007)
Raskutti, B., Kowalczyk, A.: Extreme re-balancing for svms: A case study. SIGKDD Explor. Newsl. 6(1), 60–69 (2004)
Schels, M., Scherer, S., Glodek, M., Kestler, H., Palm, G., Schwenker, F.: On the discovery of events in EEG data utilizing information fusion. Computational Statistics 28(1), 5–18 (2013)
Schölkopf, B., Platt, J.C., Shawe-Taylor, J.C., Smola, A.J., Williamson, R.C.: Estimating the support of a high-dimensional distribution. Neural Computation 13(7), 1443–1471 (2001)
Tax, D.M.J., Duin, R.P.W.: Support vector domain description. Pattern Recognition Letters 20, 1191–1199 (1999)
Van Hulse, J., Khoshgoftaar, T.M., Napolitano, A.: Experimental perspectives on learning from imbalanced data. In: Proceedings of the International Conference on Machine Learning, ICML 2007, pp. 935–942. ACM, New York (2007)
Vapnik, V.N.: Statistical Learning Theory, vol. 2. Wiley (1998)
Zeng, Z.-Q., Gao, J.: Improving SVM classification with imbalance data set. In: Leung, C.S., Lee, M., Chan, J.H. (eds.) ICONIP 2009, Part I. LNCS, vol. 5863, pp. 389–398. Springer, Heidelberg (2009)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Kächele, M., Thiam, P., Palm, G., Schwenker, F. (2014). Majority-Class Aware Support Vector Domain Oversampling for Imbalanced Classification Problems. In: El Gayar, N., Schwenker, F., Suen, C. (eds) Artificial Neural Networks in Pattern Recognition. ANNPR 2014. Lecture Notes in Computer Science(), vol 8774. Springer, Cham. https://doi.org/10.1007/978-3-319-11656-3_8
Download citation
DOI: https://doi.org/10.1007/978-3-319-11656-3_8
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-11655-6
Online ISBN: 978-3-319-11656-3
eBook Packages: Computer ScienceComputer Science (R0)