Multi Sampling Random Subspace Ensemble for Imbalanced Data Stream Classification

Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 977)


The classification of data streams is a frequently considered problem. The data coming in over time has a tendency to change its characteristics over time and usually we also encounter some difficulties in data distributions as inequality of the number of learning examples from considered classes. The combination of these two phenomena is an additional challenge. In this article, we propose a novel MSRS (Multi Sampling Random Subspace Ensemble) a chunk-based ensemble method for imbalanced non-stationary data stream classification. The proposed algorithm employs random subspace approach and balancing data using various sampling methods to ensure an appropriate diversity of the classifier ensemble. MSRS has been evaluated on the basis of the computer experiments carried out on the diverse pool of the non-stationary imbalanced data streams.


Ensemble learning Imbalanced data Concept drift Data stream 



This work was supported by the Polish National Science Centre under the grant No. 2017/27/B/ST6/01325.


  1. 1.
    Phua C, Alahakoon D, Lee V (2004) Minority report in fraud detection: classification of skewed data. Acm Sigkdd Explor Newslett 6(1):50–59CrossRefGoogle Scholar
  2. 2.
    Krawczyk B, Galar M, Jeleń Ł, Herrera F (2016) Evolutionary undersampling boosting for imbalanced classification of breast cancer malignancy. Appl Soft Comput 38:714–726CrossRefGoogle Scholar
  3. 3.
    Alqatawna J, Faris H, Jaradat K, Al-Zewairi M, Adwan O (2015) Improving knowledge based spam detection methods: the effect of malicious related features in imbalance data distribution. Int J Commun Netw Syst Sci 8(05):118Google Scholar
  4. 4.
    He H, Garcia EA (2009) Learning from imbalanced data. IEEE Trans Knowl Data Eng 21(9):1263–1284CrossRefGoogle Scholar
  5. 5.
    Branco P, Torgo L, Ribeiro R (2015) A survey of predictive modelling under imbalanced distributions. arXiv preprint arXiv:1505.01658
  6. 6.
    Visa S, Ralescu A (2005) Issues in mining imbalanced data sets-a review paper. In: Proceedings of the sixteen midwest artificial intelligence and cognitive science conference, vol 2005, pp 67–73Google Scholar
  7. 7.
    Brodersen KH, Ong CS, Stephan KE, Buhmann JM (2010) The balanced accuracy and its posterior distribution. In: 2010 20th international conference on Pattern recognition (ICPR), pp 3121–3124. IEEEGoogle Scholar
  8. 8.
    Bifet A, de Francisci Morales G, Read J, Holmes G, Pfahringer B (2015) Efficient online evaluation of big data stream classifiers. In: Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining, pp 59–68. ACMGoogle Scholar
  9. 9.
    Cohen J (1960) A coefficient of agreement for nominal scales. Educ Psychol Measur 20(1):37–46CrossRefGoogle Scholar
  10. 10.
    Krawczyk B (2016) Learning from imbalanced data: open challenges and future directions. Progress Artif Intell 5(4):221–232CrossRefGoogle Scholar
  11. 11.
    Krawczyk B, Minku LL, Gama J, Stefanowski J, Woźniak M (2017) Ensemble learning for data stream analysis: a survey. Inf Fusion 37:132–156CrossRefGoogle Scholar
  12. 12.
    Sun Y, Wong AK, Kamel MS (2009) Classification of imbalanced data: a review. Int J Pattern Recogn Artif Intell 23(04):687–719CrossRefGoogle Scholar
  13. 13.
    Hart P (1968) The condensed nearest neighbor rule (corresp.). IEEE Trans Inf Theory 14(3):515–516CrossRefGoogle Scholar
  14. 14.
    Yen SJ, Lee YS (2009) Cluster-based under-sampling approaches for imbalanced data distributions. Expert Syst Appl 36(3):5718–5727MathSciNetCrossRefGoogle Scholar
  15. 15.
    He H, Bai Y, Garcia EA, Li S (2008) ADASYN: adaptive synthetic sampling approach for imbalanced learning. In: IEEE international joint conference on neural networks, IJCNN 2008. (IEEE World Congress on Computational Intelligence), pp 1322–1328. IEEEGoogle Scholar
  16. 16.
    Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) SMOTE: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357CrossRefGoogle Scholar
  17. 17.
    Batista GE, Prati RC, Monard MC (2004) A study of the behavior of several methods for balancing machine learning training data. ACM SIGKDD Explor Newslett 6(1):20–29CrossRefGoogle Scholar
  18. 18.
    Batista GE, Bazzan AL, Monard MC (2003) Balancing training data for automated annotation of keywords: a case study. In: WOB, pp 10–18Google Scholar
  19. 19.
    Liu XY, Wu J, Zhou ZH (2009) Exploratory undersampling for class-imbalance learning. IEEE Trans Syst Man Cybern Part B (Cybern) 39(2):539–550CrossRefGoogle Scholar
  20. 20.
    Gao J, Ding B, Fan W, Han J, Philip SY (2008) Classifying data streams with skewed class distributions and concept drifts. IEEE Internet Comput 12(6):37–49CrossRefGoogle Scholar
  21. 21.
    Ditzler G, Polikar R (2013) Incremental learning of concept drift from streaming imbalanced data. IEEE Trans Knowl Data Eng 25(10):2283–2301CrossRefGoogle Scholar
  22. 22.
    Polikar R, Upda L, Upda SS, Honavar V (2001) Learn++: an incremental learning algorithm for supervised neural networks. IEEE Trans Syst Man Cybern Part C (Appl Rev) 31(4):497–508CrossRefGoogle Scholar
  23. 23.
    Elwell R, Polikar R (2009) Incremental learning of variable rate concept drift. In: International workshop on multiple classifier systems, pp 142–151. SpringerGoogle Scholar
  24. 24.
    Wang Y, Zhang Y, Wang Y (2009) Mining data streams with skewed distribution by static classifier ensemble. In: Opportunities and challenges for next-generation applied intelligence, pp 65–71. SpringerGoogle Scholar
  25. 25.
    Chen S, He H (2011) Towards incremental learning of nonstationary imbalanced data stream: a multiple selectively recursive approach. Evolving Syst 2(1):35–50CrossRefGoogle Scholar
  26. 26.
    Chen S, He H (2009) Sera: selectively recursive approach towards nonstationary imbalanced stream data mining. In: International joint conference on neural networks, IJCNN 2009, pp 522–529. IEEEGoogle Scholar
  27. 27.
    Chen S, He H, Li K, Desai S (2010) Musera: multiple selectively recursive approach towards imbalanced stream data mining. In: 2010 international joint conference on neural networks (IJCNN), pp 1–8. IEEEGoogle Scholar
  28. 28.
    Branco P, Torgo L, Ribeiro RP (2017) Relevance-based evaluation metrics for multi-class imbalanced domains. In: Proceedings of advances in knowledge discovery and data mining - 21st Pacific-Asia conference, Part I, PAKDD 2017, Jeju, South Korea, 23–26 May 2017, pp 698–710CrossRefGoogle Scholar
  29. 29.
    Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay E (2011) Scikit-learn: machine learning in Python. J Mach Learn Res 12:2825–2830MathSciNetzbMATHGoogle Scholar
  30. 30.
    Bifet A, Holmes G, Kirkby R, Pfahringer B (2010) MOA: massive online analysis. J Mach Learn Res. 11:1601–1604Google Scholar
  31. 31.
    Alcalá-Fdez J, Sánchez L, Garcia S, del Jesus MJ, Ventura S, Garrell JM, Otero J, Romero C, Bacardit J, Rivas VM et al (2009) KEEL: a software tool to assess evolutionary algorithms for data mining problems. Soft Comput 13(3):307–318CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.Wrocław University of Science and TechnologyWrocławPoland

Personalised recommendations