Advertisement

Combining Random Subspace Approach with smote Oversampling for Imbalanced Data Classification

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11734)

Abstract

Following work tries to utilize a hybrid approach of combining Random Subspace method and smote oversampling to solve a problem of imbalanced data classification. Paper contains a proposition of the ensemble diversified using Random Subspace approach, trained with a set oversampled in the context of each reduced subset of features. Algorithm was evaluated on the basis of the computer experiments carried out on the benchmark datasets and three different base classifiers.

Keywords

Imbalanced classification SMOTE Random Subspace Classifier ensembles 

Notes

Acknowledgements

This work was supported by the Polish National Science Center under the grant no. UMO- 2015/19/B/ST6/01597 and by the statutory fund of the Faculty of Electronics, Wroclaw University of Science and Technology.

References

  1. 1.
    Alcalá-Fdez, J., et al.: Keel data-mining software tool: data set repository, integration of algorithms and experimental analysis framework. J. Mult.-Valued Log. Soft Comput. 17, 255–287 (2011)Google Scholar
  2. 2.
    Bunkhumpornpat, C., Sinapiromsaran, K., Lursinsap, C.: Safe-Level-SMOTE: safe-level-synthetic minority over-sampling technique for handling the class imbalanced problem. In: Theeramunkong, T., Kijsirikul, B., Cercone, N., Ho, T.-B. (eds.) PAKDD 2009. LNCS (LNAI), vol. 5476, pp. 475–482. Springer, Heidelberg (2009).  https://doi.org/10.1007/978-3-642-01307-2_43CrossRefGoogle Scholar
  3. 3.
    Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)CrossRefGoogle Scholar
  4. 4.
    Diamantidis, N., Karlis, D., Giakoumakis, E.A.: Unsupervised stratification of cross-validation for accuracy estimation. Artif. Intell. 116(1–2), 1–16 (2000)MathSciNetCrossRefGoogle Scholar
  5. 5.
    Dietterich, T.G.: Ensemble methods in machine learning. In: Kittler, J., Roli, F. (eds.) MCS 2000. LNCS, vol. 1857, pp. 1–15. Springer, Heidelberg (2000).  https://doi.org/10.1007/3-540-45014-9_1CrossRefGoogle Scholar
  6. 6.
    García, S., Herrera, F.: Evolutionary undersampling for classification with imbalanced datasets: proposals and taxonomy. Evol. Comput. 17(3), 275–306 (2009)MathSciNetCrossRefGoogle Scholar
  7. 7.
    Gehan, E.A.: A generalized Wilcoxon test for comparing arbitrarily singly-censored samples. Biometrika 52(1–2), 203–224 (1965)MathSciNetCrossRefGoogle Scholar
  8. 8.
    Han, H., Wang, W.-Y., Mao, B.-H.: Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning. In: Huang, D.-S., Zhang, X.-P., Huang, G.-B. (eds.) ICIC 2005. LNCS, vol. 3644, pp. 878–887. Springer, Heidelberg (2005).  https://doi.org/10.1007/11538059_91CrossRefGoogle Scholar
  9. 9.
    He, H., Bai, Y., Garcia, E.A., Li, S.: ADASYN: adaptive synthetic sampling approach for imbalanced learning. In: 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence). IEEE, June 2008Google Scholar
  10. 10.
    He, H., Garcia, E.A.: Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 9, 1263–1284 (2008)Google Scholar
  11. 11.
    Huang, H.Y., Lin, Y.J., Chen, Y.S., Lu, H.Y.: Imbalanced data classification using random subspace method and SMOTE. In: The 6th International Conference on Soft Computing and Intelligent Systems, and the 13th International Symposium on Advanced Intelligence Systems. IEEE, November 2012Google Scholar
  12. 12.
    Jeni, L.A., Cohn, J.F., De La Torre, F.: Facing imbalanced data-recommendations for the use of performance metrics. In: 2013 Humaine Association Conference on Affective Computing and Intelligent Interaction, pp. 245–251. IEEE (2013)Google Scholar
  13. 13.
    Krawczyk, B.: Learning from imbalanced data: open challenges and future directions. Prog. Artif. Intell. 5(4), 221–232 (2016)CrossRefGoogle Scholar
  14. 14.
    Krawczyk, B., Woźniak, M., Herrera, F.: On the usefulness of one-class classifier ensembles for decomposition of multi-class problems. Pattern Recogn. 48(12), 3969–3982 (2015)CrossRefGoogle Scholar
  15. 15.
    Ksieniewicz, P.: Undersampled majority class ensemble for highly imbalanced binary classification. In: Torgo, L., Matwin, S., Japkowicz, N., Krawczyk, B., Moniz, N., Branco, P. (eds.) Proceedings of the Second International Workshop on Learning with Imbalanced Domains: Theory and Applications. Proceedings of Machine Learning Research, PMLR, ECML-PKDD, Dublin, Ireland, vol. 94, pp. 82–94, 10 September 2018Google Scholar
  16. 16.
    Kuncheva, L.I.: Combining Pattern Classifiers: Methods and Algorithms. Wiley, Hoboken (2004)CrossRefGoogle Scholar
  17. 17.
    Lemaître, G., Nogueira, F., Aridas, C.K.: Imbalanced-learn: a Python toolbox to tackle the curse of imbalanced datasets in machine learning. J. Mach. Learn. Res. 18(1), 559–563 (2017)zbMATHGoogle Scholar
  18. 18.
    Maciejewski, T., Stefanowski, J.: Local neighbourhood extension of SMOTE for mining imbalanced data. In: 2011 IEEE Symposium on Computational Intelligence and Data Mining (CIDM). IEEE, April 2011Google Scholar
  19. 19.
    Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12(Oct), 2825–2830 (2011)MathSciNetzbMATHGoogle Scholar
  20. 20.
    Quinlan, J.R., et al.: Bagging, boosting, and C4. 5. In: AAAI/IAAI, vol. 1, pp. 725–730 (1996)Google Scholar
  21. 21.
    Sasaki, Y., et al.: The truth of the F-measure. Teach Tutor Mater 1(5), 1–5 (2007)Google Scholar
  22. 22.
    Sun, Y., Wong, A.K., Kamel, M.S.: Classification of imbalanced data: a review. Int. J. Pattern Recogn. Artif. Intell. 23(04), 687–719 (2009)CrossRefGoogle Scholar
  23. 23.
    Topolski, M.: Multidimensional MCA correspondence model supporting intelligent transport management. Arch. Transp. Syst. Telemat. 11, 52–56 (2018)Google Scholar
  24. 24.
    Topolski, M.: Algorithm of multidimensional analysis of main features of PCA with blurry observation of facility features detection of carcinoma cells multiple myeloma. In: Burduk, R., Kurzynski, M., Wozniak, M. (eds.) CORES 2019. AISC, vol. 977, pp. 286–294. Springer, Cham (2020).  https://doi.org/10.1007/978-3-030-19738-4_29CrossRefGoogle Scholar
  25. 25.
    Wozniak, M.: Hybrid Classifiers: Methods of Data, Knowledge, and Classifier Combination. Studies in Computational Intelligence, vol. 519. Springer, Heidelberg (2013).  https://doi.org/10.1007/978-3-642-40997-4CrossRefGoogle Scholar
  26. 26.
    Yu, G., Zhang, G., Domeniconi, C., Yu, Z., You, J.: Semi-supervised classification based on random subspace dimensionality reduction. Pattern Recogn. 45(3), 1119–1135 (2012)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Wrocław University of Science and TechnologyWrocławPoland

Personalised recommendations