Advertisement

Imbalanced Data Stream Classification Using Hybrid Data Preprocessing

Conference paper
  • 523 Downloads
Part of the Communications in Computer and Information Science book series (CCIS, volume 1168)

Abstract

Imbalanced data streams have gained significant popularity among the researchers in recent years. This area of research is not only still greatly underdeveloped, but there are also numerous inherent difficulties that need to be addressed when creating algorithms that could be utilized in such dynamic environment and achieve satisfactory results when it comes to their predictive abilities. In this paper, a novel algorithm that combines both over- and under-sampling techniques in order to create a more robust classifier dedicated to imbalanced data streams is proposed. The efficiency and high predictive quality of the proposed method have been confirmed on the basis of extensive experimental research carried out on the real and the computer-generated data streams.

Keywords

Imbalanced data Data stream classification Data preprocessing 

Notes

Acknowledgement

This work was supported by the Polish National Science Centre under the grant No. 2017/27/B/ST6/01325 as well as by the statutory funds of the Department of Systems and Computer Networks, Faculty of Electronics, Wroclaw University of Science and Technology.

References

  1. 1.
    Alcalá-Fdez, J., Fernández, A., Luengo, J., Derrac, J., García, S.: Keel data-mining software tool: data set repository, integration of algorithms and experimental analysis framework. Mult. Valued Log. Soft Comput. 17(2–3), 255–287 (2011). http://dblp.uni-trier.de/db/journals/mvl/mvl17.htmlGoogle Scholar
  2. 2.
    Bifet, A., Gavaldà, R.: Learning from time-changing data with adaptive windowing. In: In SIAM International Conference on Data Mining (2007)Google Scholar
  3. 3.
    Bobowska, B., Woźniak, M.: Experimental study on modified radial-based oversampling. In: Graña, M., et al. (eds.) SOCO’18-CISIS’18-ICEUTE’18 2018. AISC, vol. 771, pp. 110–119. Springer, Cham (2019).  https://doi.org/10.1007/978-3-319-94120-2_11CrossRefGoogle Scholar
  4. 4.
    Brzezinski, D., Stefanowski, J.: Ensemble classifiers for imbalanced and evolving data streams, pp. 44–68, March 2018.  https://doi.org/10.1142/9789813228047_0003
  5. 5.
    Chen, S., He, H.: SERA: selectively recursive approach towards nonstationary imbalanced stream data mining. In: 2009 International Joint Conference on Neural Networks, pp. 522–529, June 2009.  https://doi.org/10.1109/IJCNN.2009.5178874
  6. 6.
    Chen, S., He, H.: Towards incremental learning of nonstationary imbalanced data stream: a multiple selectively recursive approach. Evol. Syst. 2(1), 35–50 (2011).  https://doi.org/10.1007/s12530-010-9021-yCrossRefGoogle Scholar
  7. 7.
    Ditzler, G., Polikar, R.: Incremental learning of concept drift from streaming imbalanced data. IEEE Trans. Knowl. Data Eng. 25(10), 2283–2301 (2013)CrossRefGoogle Scholar
  8. 8.
    Ditzler, G., Roveri, M., Alippi, C., Polikar, R.: Learning in nonstationary environments: a survey. IEEE Comput. Intell. Mag. 10, 12–25 (2015).  https://doi.org/10.1109/MCI.2015.2471196CrossRefGoogle Scholar
  9. 9.
    Gama, J., Žliobaitė, I., Bifet, A., Pechenizkiy, M., Bouchachia, A.: A survey on concept drift adaptation. ACM Comput. Surv. 46(4), 44:1–44:37 (2014).  https://doi.org/10.1145/2523813. http://doi.acm.org/10.1145/2523813CrossRefzbMATHGoogle Scholar
  10. 10.
    Gama, J.: Knowledge Discovery from Data Streams, 1st edn. Chapman & Hall/CRC, Boca Raton (2010)CrossRefGoogle Scholar
  11. 11.
    Gao, J., Ding, B., Fan, W., Han, J., Philip, S.Y.: Classifying data streams with skewed class distributions and concept drifts. IEEE Internet Comput. 12(6), 37–49 (2008)CrossRefGoogle Scholar
  12. 12.
    Hoens, T.R., Polikar, R., Chawla, N.V.: Learning from streaming data with concept drift and imbalance: an overview. Prog. Artif. Intell. 1(1), 89–101 (2012).  https://doi.org/10.1007/s13748-011-0008-0CrossRefGoogle Scholar
  13. 13.
    Hulten, G., Spencer, L., Domingos, P.: Mining time-changing data streams. In: Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2001, pp. 97–106. ACM, New York (2001).  https://doi.org/10.1145/502512.502529. http://doi.acm.org/10.1145/502512.502529
  14. 14.
    Kolter, J.Z., Maloof, M.A.: Dynamic weighted majority: a new ensemble method for tracking concept drift. In: Proceedings of the Third IEEE International Conference on Data Mining, ICDM 2003, p. 123. IEEE Computer Society, Washington, D.C. (2003). http://dl.acm.org/citation.cfm?id=951949.952136
  15. 15.
    Koziarski, M., Krawczyk, B., Woźniak, M.: Radial-based approach to imbalanced data oversampling. In: Martínez de Pisón, F.J., Urraca, R., Quintián, H., Corchado, E. (eds.) HAIS 2017. LNCS (LNAI), vol. 10334, pp. 318–327. Springer, Cham (2017).  https://doi.org/10.1007/978-3-319-59650-1_27CrossRefGoogle Scholar
  16. 16.
    Krawczyk, B., Minku, L.L., Gama, J., Stefanowski, J., Woniak, M.: Ensemble learning for data stream analysis. Inf. Fusion 37(C), 132–156 (2017).  https://doi.org/10.1016/j.inffus.2017.02.004CrossRefGoogle Scholar
  17. 17.
    Lemaître, G., Nogueira, F., Aridas, C.K.: Imbalanced-learn: a Python toolbox to tackle the curse of imbalanced datasets in machine learning. J. Mach. Learn. Res. 18(17), 1–5 (2017). http://jmlr.org/papers/v18/16-365.htmlGoogle Scholar
  18. 18.
    Littlestone, N., Warmuth, M.K.: The weighted majority algorithm. Inf. Comput. 108(2), 212–261 (1994).  https://doi.org/10.1006/inco.1994.1009MathSciNetCrossRefzbMATHGoogle Scholar
  19. 19.
    Masud, M., Gao, J., Khan, L., Han, J., Thuraisingham, B.: A practical approach to classify evolving data streams: training with limited amount of labeled data, pp. 929–934, December 2008.  https://doi.org/10.1109/ICDM.2008.152
  20. 20.
    Pedregosa, F., et al.: Scikit-learn: Machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011). http://www.jmlr.org/papers/volume12/pedregosa11a/pedregosa11a.pdfMathSciNetzbMATHGoogle Scholar
  21. 21.
    Sayyad Shirabad, J., Menzies, T.: The PROMISE repository of software engineering databases. School of Information Technology and Engineering, University of Ottawa, Canada (2005). http://promise.site.uottawa.ca/SERepository
  22. 22.
    Stefanowski, J., Brzezinski, D.: Stream classification. In: Sammut, C., Webb, G.I. (eds.) Encyclopedia of Machine Learning and Data Mining, pp. 1191–1199. Springer, Boston (2017).  https://doi.org/10.1007/978-1-4899-7687-1_908CrossRefGoogle Scholar
  23. 23.
    Street, W.N., Kim, Y.: A streaming ensemble algorithm (SEA) for large-scale classification. In: Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2001, pp. 377–382. ACM, New York (2001).  https://doi.org/10.1145/502512.502568. http://doi.acm.org/10.1145/502512.502568
  24. 24.
    Sun, Y., Wong, A.K.C., Kamel, M.S.: Classification of imbalanced data: a review. IJPRAI 23, 687–719 (2009)Google Scholar
  25. 25.
    Wang, Y., Zhang, Y., Wang, Y.: Mining data streams with skewed distribution by static classifier ensemble. In: Chien, B.C., Hong, T.P. (eds.) Opportunities and Challenges for Next-Generation Applied Intelligence. SCI, vol. 214, pp. 65–71. Springer, Heidelberg (2009).  https://doi.org/10.1007/978-3-540-92814-0_11CrossRefGoogle Scholar
  26. 26.
    Woźniak, M., Graña, M., Corchado, E.: A survey of multiple classifier systems as hybrid systems. Inf. Fusion 16, 3–17 (2014)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.Department of Systems and Computer NetworksWrocław University of Science and TechnologyWrocławPoland

Personalised recommendations