Advertisement

Borderline Kernel Based Over-Sampling

  • María Pérez-Ortiz
  • Pedro Antonio Gutiérrez
  • César Hervás-Martínez
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8073)

Abstract

Nowadays, the imbalanced nature of some real-world data is receiving a lot of attention from the pattern recognition and machine learning communities in both theoretical and practical aspects, giving rise to different promising approaches to handling it. However, preprocessing methods operate in the original input space, presenting distortions when combined with kernel classifiers, that operate in the feature space induced by a kernel function. This paper explores the notion of empirical feature space (a Euclidean space which is isomorphic to the feature space and therefore preserves its structure) to derive a kernel-based synthetic over-sampling technique based on borderline instances which are considered as crucial for establishing the decision boundary. Therefore, the proposed methodology would maintain the main properties of the kernel mapping while reinforcing the decision boundaries induced by a kernel machine. The results show that the proposed method achieves better results than the same borderline over- sampling method applied in the original input space.

Keywords

Feature Space Input Space Kernel Matrix Training Pattern Minority Class 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: Smote: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research 16, 321–357 (2002)zbMATHGoogle Scholar
  2. 2.
    Tang, Y., Zhang, Y.Q., Chawla, N.V., Krasser, S.: SVMs modeling for highly imbalanced classification. IEEE Transactions on Systems, Man and Cybernetics, Part B: Cybernetics 39(1), 281–288 (2009)CrossRefGoogle Scholar
  3. 3.
    Galar, M., Fernández, A., Barrenechea, E., Bustince, H., Herrera, F.: A review on ensembles for the class imbalance problem: Bagging-, boosting-, and hybrid-based approaches. IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews 42(4), 463–484 (2012)CrossRefGoogle Scholar
  4. 4.
    Schölkopf, B., Smola, A.J.: Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press (2001)Google Scholar
  5. 5.
    Schölkopf, B., Mika, S., Burges, C.J.C., Knirsch, P., Müller, K.R., Rätsch, G., Smola, A.J.: Input space versus feature space in kernel-based methods. IEEE Transactions on Neural Networks 10, 1000–1017 (1999)CrossRefGoogle Scholar
  6. 6.
    Xiong, H., Swamy, M.N.S., Ahmad, M.O.: Optimizing the kernel in the empirical feature space. IEEE Transactions on Neural Networks 16(2), 460–474 (2005)CrossRefGoogle Scholar
  7. 7.
    Yan, F., Mikolajczyk, K., Kittler, J., Tahir, M.A.: Combining multiple kernels by augmenting the kernel matrix. In: El Gayar, N., Kittler, J., Roli, F. (eds.) MCS 2010. LNCS, vol. 5997, pp. 175–184. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  8. 8.
    Xiong, H., Swamy, M.N.S., Ahmad, M.O.: Learning with the optimized data-dependent kernel. In: Proc. of the 2004 Conference on Computer Vision and Pattern Recognition Workshop, CVPRW, vol. 6, pp. 95–101. IEEE Computer Society (2004)Google Scholar
  9. 9.
    Abe, S., Onishi, K.: Sparse least squares support vector regressors trained in the reduced empirical feature space. In: de Sá, J.M., Alexandre, L.A., Duch, W., Mandic, D.P. (eds.) ICANN 2007. LNCS, vol. 4669, pp. 527–536. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  10. 10.
    Xiong, H.: A unified framework for kernelization: The empirical kernel feature space. In: Chinese Conference on Pattern Recognition, CCPR, pp. 1–5 (November 2009)Google Scholar
  11. 11.
    Han, H., Wang, W.-Y., Mao, B.-H.: Borderline-SMOTE: A New Over-Sampling Method in Imbalanced Data Sets Learning. In: Huang, D.-S., Zhang, X.-P., Huang, G.-B. (eds.) ICIC 2005. LNCS, vol. 3644, pp. 878–887. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  12. 12.
    Wang, H.Y.: Combination approach of smote and biased-svm for imbalanced datasets (2008)Google Scholar
  13. 13.
    Zeng, Z.-Q., Gao, J.: Improving SVM classification with imbalance data set. In: Leung, C.S., Lee, M., Chan, J.H. (eds.) ICONIP 2009, Part I. LNCS, vol. 5863, pp. 389–398. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  14. 14.
    Schölkopf, B., Smola, A., Müller, K.R.: Nonlinear component analysis as a kernel eigenvalue problem. Neural Computation 10(5), 460–474 (1998)CrossRefGoogle Scholar
  15. 15.
    Cortes, C., Vapnik, V.: Support-vector networks. Machine Learning 20(3), 273–297 (1995)zbMATHGoogle Scholar
  16. 16.
    Asuncion, A., Newman, D.: UCI machine learning repository (2007)Google Scholar
  17. 17.
    Fernández-Caballero, J.C., Martínez-Estudillo, F.J., Hervás-Martínez, C., Gutiérrez, P.A.: Sensitivity versus accuracy in multiclass problems using memetic pareto evolutionary neural networks. IEEE Transactions on Neural Networks 21(5), 750–770 (2010)CrossRefGoogle Scholar
  18. 18.
    Demsar, J.: Statistical comparisons of classifiers over multiple data sets. Journal of Machine Learning Research 7, 1–30 (2006)MathSciNetzbMATHGoogle Scholar
  19. 19.
    Braun, M.L., Buhmann, J.M., Müller, K.R.: On relevant dimensions in kernel feature spaces. J. Mach. Learn. Res. 9, 1875–1908 (2008)MathSciNetzbMATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • María Pérez-Ortiz
    • 1
  • Pedro Antonio Gutiérrez
    • 1
  • César Hervás-Martínez
    • 1
  1. 1.Dept. of Computer Science and Numerical AnalysisUniversity of CórdobaCórdobaSpain

Personalised recommendations