Borderline Kernel Based Over-Sampling
Nowadays, the imbalanced nature of some real-world data is receiving a lot of attention from the pattern recognition and machine learning communities in both theoretical and practical aspects, giving rise to different promising approaches to handling it. However, preprocessing methods operate in the original input space, presenting distortions when combined with kernel classifiers, that operate in the feature space induced by a kernel function. This paper explores the notion of empirical feature space (a Euclidean space which is isomorphic to the feature space and therefore preserves its structure) to derive a kernel-based synthetic over-sampling technique based on borderline instances which are considered as crucial for establishing the decision boundary. Therefore, the proposed methodology would maintain the main properties of the kernel mapping while reinforcing the decision boundaries induced by a kernel machine. The results show that the proposed method achieves better results than the same borderline over- sampling method applied in the original input space.
KeywordsFeature Space Input Space Kernel Matrix Training Pattern Minority Class
Unable to display preview. Download preview PDF.
- 3.Galar, M., Fernández, A., Barrenechea, E., Bustince, H., Herrera, F.: A review on ensembles for the class imbalance problem: Bagging-, boosting-, and hybrid-based approaches. IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews 42(4), 463–484 (2012)CrossRefGoogle Scholar
- 4.Schölkopf, B., Smola, A.J.: Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press (2001)Google Scholar
- 8.Xiong, H., Swamy, M.N.S., Ahmad, M.O.: Learning with the optimized data-dependent kernel. In: Proc. of the 2004 Conference on Computer Vision and Pattern Recognition Workshop, CVPRW, vol. 6, pp. 95–101. IEEE Computer Society (2004)Google Scholar
- 10.Xiong, H.: A unified framework for kernelization: The empirical kernel feature space. In: Chinese Conference on Pattern Recognition, CCPR, pp. 1–5 (November 2009)Google Scholar
- 12.Wang, H.Y.: Combination approach of smote and biased-svm for imbalanced datasets (2008)Google Scholar
- 16.Asuncion, A., Newman, D.: UCI machine learning repository (2007)Google Scholar