Advertisement

Experimental Study on Modified Radial-Based Oversampling

  • Barbara BobowskaEmail author
  • Michał Woźniak
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 771)

Abstract

Although, imbalanced data analysis gained significant attention in the past years, it still remains an underdeveloped area of research posing many difficulties due to the difference in the number of objects in the examined classes, rendering traditional, accuracy driven machine learning methods useless. With many modern real-life applications being examples of imbalanced data classification i.e. fraud detection, medical diagnosis, oil-spills detection in satellite images or network anomaly detection, it is crucial to develop new algorithms suitable to use in such situations. One of the approaches to deal with the disproportion between the instances of objects in classes are either over- or undersampling techniques. In this paper, we propose a modification of an existing RBO algorithm. Due to the additional constraint the modified algorithm eliminates instances which may be problematic to classify. Additionally, a recursion mechanism was added in order to make the search of synthetic points more robust. The results obtained from computer experiments carried out on the benchmark datasets prove that the presented algorithm is applicable.

Notes

Acknowledgements

This work is supported by the Polish National Science Center under the Grant no. UMO-2015/19/B/ST6/01597 as well the statutory funds of the Department of Systems and Computer Networks, Faculty of Electronics, Wrocław University of Science and Technology.

References

  1. 1.
    Alcalá-Fdez, J., Fernández, A., Luengo, J., Derrac, J., García, S., Sánchez, L., Herrera, F.: Keel data-mining software tool: data set repository, integration of algorithms and experimental analysis framework. J. Multiple Valued Log. Soft Comput. 17, 255–287 (2011)Google Scholar
  2. 2.
    Bunkhumpornpat, C., Sinapiromsaran, K., Lursinsap, C.: Safe-level-SMOTE: safe-level-synthetic minority over-sampling technique for handling the class imbalanced problem. In: Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp. 475–482. Springer (2009)CrossRefGoogle Scholar
  3. 3.
    Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)zbMATHGoogle Scholar
  4. 4.
    Chawla, N.V., Lazarevic, A., Hall, L.O., Bowyer, K.W.: SMOTEBoost: improving prediction of the minority class in boosting. In: European Conference on Principles of Data Mining and Knowledge Discovery, pp. 107–119. Springer (2003)CrossRefGoogle Scholar
  5. 5.
    Ganganwar, V.: An overview of classification algorithms for imbalanced datasets. Int. J. Emerg. Technol. Adv. Eng. 2(4), 42–47 (2012)Google Scholar
  6. 6.
    García, S., Herrera, F.: Evolutionary undersampling for classification with imbalanced datasets: proposals and taxonomy. Evol. Comput. 17(3), 275–306 (2009)MathSciNetCrossRefGoogle Scholar
  7. 7.
    He, H., Bai, Y., Garcia, E.A., Li, S.: ADASYN: adaptive synthetic sampling approach for imbalanced learning. In: IEEE International Joint Conference on Neural Networks, IJCNN 2008. IEEE World Congress on Computational Intelligence, pp. 1322–1328. IEEE (2008)Google Scholar
  8. 8.
    Hempstalk, K., Frank, E., Witten, I.H.: One-class classification by combining density and class probability estimation. In: Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pp. 505–519. Springer (2008)Google Scholar
  9. 9.
    Koziarski, M., Krawczyk, B., Woźniak, M.: Radial-based approach to imbalanced data oversampling. In: International Conference on Hybrid Artificial Intelligence Systems, pp. 318–327. Springer (2017)Google Scholar
  10. 10.
    Ramentol, E., Caballero, Y., Bello, R., Herrera, F.: SMOTE-RSB*: a hybrid preprocessing approach based on oversampling and undersampling for high imbalanced data-sets using smote and rough sets theory. Knowl. Inf. Syst. 33(2), 245–265 (2012)CrossRefGoogle Scholar
  11. 11.
    Sun, Y., Kamel, M.S., Wong, A.K., Wang, Y.: Cost-sensitive boosting for classification of imbalanced data. Pattern Recogn. 40(12), 3358–3378 (2007)CrossRefGoogle Scholar
  12. 12.
    Woźniak, M., Graña, M., Corchado, E.: A survey of multiple classifier systems as hybrid systems. Inf. Fusion 16, 3–17 (2014)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Faculty of Electronics, Department of Systems and Computer NetworksWrocław University of Science and TechnologyWrocławPoland

Personalised recommendations