Classifier Selection for Highly Imbalanced Data Streams with Minority Driven Ensemble

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11508)


The nature of analysed data may cause the difficulty of the many practical data mining tasks. This work is focusing on two of the important research topics associated with data analysis, i.e., data stream classification as well as data analysis with imbalanced class distributions. We propose the novel classification method, employing a classifier selection approach, which can update its model when new data arrives. The proposed approach has been evaluated on the basis of the computer experiments carried out on the diverse pool of the non-stationary data streams. Their results confirmed the usefulness of the proposed concept, which can outperform the state-of-art classifier selection algorithms, especially in the case of high imbalanced data streams.


Data streams Concept drift Imbalanced data Classifier selection 



This work was supported by the Polish National Science Centre under the grant No. 2017/27/B/ST6/01325 as well by the statutory funds of the Department of Systems and Computer Networks, Faculty of Electronics, Wroclaw University of Science and Technology.


  1. 1.
    Branco, P., Torgo, L., Ribeiro, R.P.: Relevance-based evaluation metrics for multi-class imbalanced domains. In: Kim, J., Shim, K., Cao, L., Lee, J.-G., Lin, X., Moon, Y.-S. (eds.) PAKDD 2017, Part I. LNCS (LNAI), vol. 10234, pp. 698–710. Springer, Cham (2017). Scholar
  2. 2.
    Chen, S., He, H.: Towards incremental learning of nonstationary imbalanced data stream: a multiple selectively recursive approach. Evol. Syst. 2(1), 35–50 (2011)CrossRefGoogle Scholar
  3. 3.
    Chen, X.w., Wasikowski, M.: Fast: a ROC-based feature selection metric for small samples and imbalanced data classification problems. In: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 124–132 (2008)Google Scholar
  4. 4.
    Cruz, R.M.O., Hafemann, L.G., Sabourin, R., Cavalcanti, G.D.C.: DESlib: a dynamic ensemble selection library in Python. arXiv preprint arXiv:1802.04967 (2018)
  5. 5.
    Cruz, R.M., Sabourin, R., Cavalcanti, G.D.: Dynamic classifier selection. Inf. Fusion 41(C), 195–216 (2018)CrossRefGoogle Scholar
  6. 6.
    Ditzler, G., Polikar, R.: Incremental learning of concept drift from streaming imbalanced data. IEEE Trans. Knowl. Data Eng. 25(10), 2283–2301 (2013)CrossRefGoogle Scholar
  7. 7.
    Gao, J., Yu, P.S., Fan, W., Ding, B., Han, J.: Classifying data streams with skewed class distributions and concept drifts. IEEE Internet Comput. 12, 37–49 (2008)CrossRefGoogle Scholar
  8. 8.
    Guyon, I.: Design of experiments of the NIPS 2003 variable selection benchmark. In: NIPS 2003 Workshop on Feature Extraction and Feature Selection, pp. 545–552 (2003)Google Scholar
  9. 9.
    Jackowski, K., Krawczyk, B., Woźniak, M.: Improved adaptive splitting and selection: the hybrid training method of a classifier based on a feature space partitioning. Int. J. Neural Syst. 24(3) (2014)CrossRefGoogle Scholar
  10. 10.
    Krawczyk, B.: Learning from imbalanced data: open challenges and future directions. Progress Artif. Intell. 5(4), 221–232 (2016)CrossRefGoogle Scholar
  11. 11.
    Krawczyk, B., Minku, L.L., Gama, J., Stefanowski, J., Wozniak, M.: Ensemble learning for data stream analysis: a survey. Inf. Fusion 37, 132–156 (2017)CrossRefGoogle Scholar
  12. 12.
    Kuncheva, L.I.: Clustering-and-selection model for classifier combination. In: Proceedings of the Fourth International Conference on Knowledge-Based Intelligent Information Engineering Systems & Allied Technologies, KES 2000, Brighton, UK, 30 August–1 September 2000, vol. 2, pp. 185–188 (2000)Google Scholar
  13. 13.
    Lichtenwalter, R.N., Chawla, N.V.: Adaptive methods for classification in arbitrarily imbalanced and drifting data streams. In: Theeramunkong, T., Nattee, C., Adeodato, P.J.L., Chawla, N., Christen, P., Lenca, P., Poon, J., Williams, G. (eds.) PAKDD 2009. LNCS (LNAI), vol. 5669, pp. 53–75. Springer, Heidelberg (2010). Scholar
  14. 14.
    Napierala, K., Stefanowski, J.: Identification of different types of minority class examples in imbalanced data. In: Corchado, E., Snášel, V., Abraham, A., Woźniak, M., Graña, M., Cho, S.-B. (eds.) HAIS 2012. LNCS (LNAI), vol. 7209, pp. 139–150. Springer, Heidelberg (2012). Scholar
  15. 15.
    Napierala, K., Stefanowski, J.: Types of minority class examples and their influence on learning classifiers from imbalanced data. J. Intell. Inf. Syst. 46, 563–597 (2015)CrossRefGoogle Scholar
  16. 16.
    Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)MathSciNetzbMATHGoogle Scholar
  17. 17.
    Smits, P.C.: Multiple classifier systems for supervised remote sensing image classification based on dynamic classifier selection. IEEE Trans. Geosci. Remote Sens. 40(4), 801–813 (2002)CrossRefGoogle Scholar
  18. 18.
    Soares, R.G.F., Santana, A., Canuto, A.M.P., de Souto, M.C.P.: Using accuracy and diversity to select classifiers to build ensembles. In: Proceedings of IEEE International Joint Conference on Neural Network, pp. 1310–1316, July 2006Google Scholar
  19. 19.
    Wang, Y., Zhang, Y., Wang, Y.: Mining data streams with skewed distribution by static classifier ensemble. In: Chien, B.C., Hong, T.P. (eds.) Opportunities and Challenges for Next-Generation Applied Intelligence. SCI, vol. 214, pp. 65–71. Springer, Heidelberg (2009). Scholar
  20. 20.
    Woźniak, M., Graña, M., Corchado, E.: A survey of multiple classifier systems as hybrid systems. Inf. Fusion 16, 3–17 (2014)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Department of Systems and Computer Networks, Faculty of ElectronicsWrocław University of Science and TechnologyWrocławPoland

Personalised recommendations