Experimental Study on Modified Radial-Based Oversampling
Although, imbalanced data analysis gained significant attention in the past years, it still remains an underdeveloped area of research posing many difficulties due to the difference in the number of objects in the examined classes, rendering traditional, accuracy driven machine learning methods useless. With many modern real-life applications being examples of imbalanced data classification i.e. fraud detection, medical diagnosis, oil-spills detection in satellite images or network anomaly detection, it is crucial to develop new algorithms suitable to use in such situations. One of the approaches to deal with the disproportion between the instances of objects in classes are either over- or undersampling techniques. In this paper, we propose a modification of an existing RBO algorithm. Due to the additional constraint the modified algorithm eliminates instances which may be problematic to classify. Additionally, a recursion mechanism was added in order to make the search of synthetic points more robust. The results obtained from computer experiments carried out on the benchmark datasets prove that the presented algorithm is applicable.
This work is supported by the Polish National Science Center under the Grant no. UMO-2015/19/B/ST6/01597 as well the statutory funds of the Department of Systems and Computer Networks, Faculty of Electronics, Wrocław University of Science and Technology.
- 1.Alcalá-Fdez, J., Fernández, A., Luengo, J., Derrac, J., García, S., Sánchez, L., Herrera, F.: Keel data-mining software tool: data set repository, integration of algorithms and experimental analysis framework. J. Multiple Valued Log. Soft Comput. 17, 255–287 (2011)Google Scholar
- 5.Ganganwar, V.: An overview of classification algorithms for imbalanced datasets. Int. J. Emerg. Technol. Adv. Eng. 2(4), 42–47 (2012)Google Scholar
- 7.He, H., Bai, Y., Garcia, E.A., Li, S.: ADASYN: adaptive synthetic sampling approach for imbalanced learning. In: IEEE International Joint Conference on Neural Networks, IJCNN 2008. IEEE World Congress on Computational Intelligence, pp. 1322–1328. IEEE (2008)Google Scholar
- 8.Hempstalk, K., Frank, E., Witten, I.H.: One-class classification by combining density and class probability estimation. In: Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pp. 505–519. Springer (2008)Google Scholar
- 9.Koziarski, M., Krawczyk, B., Woźniak, M.: Radial-based approach to imbalanced data oversampling. In: International Conference on Hybrid Artificial Intelligence Systems, pp. 318–327. Springer (2017)Google Scholar