Advertisement

A Genetic-Based Ensemble Learning Applied to Imbalanced Data Classification

  • Jakub Klikowski
  • Paweł KsieniewiczEmail author
  • Michał Woźniak
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11872)

Abstract

Imbalanced data classification is still a focus of intense research, due to its ever-growing presence in the real-life decision tasks. In this article, we focus on a classifier ensemble for imbalanced data classification. The ensemble is formed on the basis of the individual classifiers trained on supervise-selected feature subsets. There are several methods employing this concept to ensure a high diverse ensemble, nevertheless most of them, as Random Subspace or Random Forest, select attributes for a particular classifier randomly. The main drawback of mentioned methods is not giving the ability to supervise and control this task. In following work, we apply a genetic algorithm to the considered problem. Proposition formulates an original learning criterion, taking into consideration not only the overall classification performance but also ensures that trained ensemble is characterised by high diversity. The experimental study confirmed the high efficiency of the proposed algorithm and its superiority to other ensemble forming method based on random feature selection.

Keywords

Machine learning Classification Imbalanced data Feature selection Genetic algorithm 

Notes

Acknowledgments

This work was supported by the Polish National Science Centre under the grant No. 2017/27/B/ST6/01325 as well as by the statutory funds of the Department of Systems and Computer Networks, Faculty of Electronics, Wroclaw University of Science and Technology.

References

  1. 1.
    Alcalá-Fdez, J., et al.: Keel data-mining software tool: data set repository, integration of algorithms and experimental analysis framework. J. Mult. Valued Log. Soft Comput. 17, 255–287 (2011)Google Scholar
  2. 2.
    Back, T., Fogel, D., Michalewicz, Z.: Handbook of Evolutionary Computation. Oxford University Press, New York (1997)CrossRefGoogle Scholar
  3. 3.
    Branco, P., Torgo, L., Ribeiro, R.P.: Relevance-based evaluation metrics for multi-class imbalanced domains. In: Kim, J., Shim, K., Cao, L., Lee, J.-G., Lin, X., Moon, Y.-S. (eds.) PAKDD 2017. LNCS (LNAI), vol. 10234, pp. 698–710. Springer, Cham (2017).  https://doi.org/10.1007/978-3-319-57454-7_54CrossRefGoogle Scholar
  4. 4.
    Canuto, A.M., Nascimento, D.S.: A genetic-based approach to features selection for ensembles using a hybrid and adaptive fitness function. In: The 2012 international joint conference on neural networks (IJCNN), pp. 1–8. IEEE (2012)Google Scholar
  5. 5.
    Demšar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)MathSciNetzbMATHGoogle Scholar
  6. 6.
    Du, L., Xu, Y., Jin, L.: Feature selection for imbalanced datasets based on improved genetic algorithm. In: Decision Making and Soft Computing: Proceedings of the 11th International FLINS Conference, pp. 119–124. World Scientific (2014)Google Scholar
  7. 7.
    García, S., Fernández, A., Luengo, J., Herrera, F.: Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: experimental analysis of power. Inf. Sci. 180(10), 2044–2064 (2010)CrossRefGoogle Scholar
  8. 8.
    Haque, M.N., Noman, N., Berretta, R., Moscato, P.: Heterogeneous ensemble combination search using genetic algorithm for class imbalanced data classification. PloS One 11(1), e0146116 (2016)CrossRefGoogle Scholar
  9. 9.
    Koziarski, M., Krawczyk, B., Woźniak, M.: The deterministic subspace method for constructing classifier ensembles. Pattern Anal. Appl. 20(4), 981–990 (2017)MathSciNetCrossRefGoogle Scholar
  10. 10.
    Krawczyk, B.: Learning from imbalanced data: open challenges and future directions. Prog. Artif. Intell. 5(4), 221–232 (2016).  https://doi.org/10.1007/s13748-016-0094-0CrossRefGoogle Scholar
  11. 11.
    Ksieniewicz, P., Woźniak, M.: Imbalanced data classification based on feature selection techniques. In: Yin, H., Camacho, D., Novais, P., Tallón-Ballesteros, A.J. (eds.) IDEAL 2018. LNCS, vol. 11315, pp. 296–303. Springer, Cham (2018).  https://doi.org/10.1007/978-3-030-03496-2_33CrossRefGoogle Scholar
  12. 12.
    Lee, H.M., Chen, C.M., Chen, J.M., Jou, Y.L.: An efficient fuzzy classifier with feature selection based on fuzzy entropy. IEEE Trans. Syst. Man Cybern. Part B (Cybern.) 31(3), 426–432 (2001)CrossRefGoogle Scholar
  13. 13.
    Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)MathSciNetzbMATHGoogle Scholar
  14. 14.
    Wozniak, M., Graña, M., Corchado, E.: A survey of multiple classifier systems as hybrid systems. Inf. Fusion 16, 3–17 (2014)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Jakub Klikowski
    • 1
  • Paweł Ksieniewicz
    • 1
    Email author
  • Michał Woźniak
    • 1
  1. 1.Department of Systems and Computer NetworksWrocław University of Science and TechnologyWrocławPoland

Personalised recommendations