SMOTE Algorithm Variations in Balancing Data Streams
From one year to another, more and more vast amounts of data is being created in different fields of application. Great deal of those sources require real-time processing and analyzing, which leads to increased interest in streaming data classification field of machine learning. It is not rare, that many of those applications deal with somehow skewed or imbalanced data. In this paper, we analyze usage of smote oversampling algorithm variations in learning patterns from imbalanced data streams using different incremental learning ensemble algorithms.
KeywordsData streams Imbalanced learning Synthetic oversampling Classifier ensembles
This work is supported by the Polish National Science Center under the Grant no. UMO-2015/19/B/ST6/01597 as well the statutory funds of the Department of Systems and Computer Networks, Faculty of Electronics, Wrocław University of Science and Technology.
- 2.Bifet, A., Holmes, G., Kirkby, R., Pfahringer, B.: MOA: massive online analysis. J. Mach. Learn. Res. 11(May), 1601–1604 (2010)Google Scholar
- 4.Bunkhumpornpat, C., Sinapiromsaran, K., Lursinsap, C.: Safe-level-SMOTE: safe-level-synthetic minority over-sampling TEchnique for handling the class imbalanced problem. In: Theeramunkong, T., Kijsirikul, B., Cercone, N., Ho, T.-B. (eds.) PAKDD 2009. LNCS (LNAI), vol. 5476, pp. 475–482. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-01307-2_43CrossRefGoogle Scholar
- 5.Capgemini, BNP Paribas: World payments report 2018, October 2018. https://worldpaymentsreport.com/wp-content/uploads/sites/5/2018/10/World-Payments-Report-WPR18-2018.pdf. Accessed 12 Feb 2019
- 7.Facebook Inc.: Facebook reports fourth quarter and full year 2018 results, January 2019. https://investor.fb.com/investor-news/press-release-details/2019/Facebook-Reports-Fourth-Quarter-and-Full-Year-2018-Results/default.aspx. Accessed 21 Feb 2019
- 8.Gama, J., Sebastião, R., Rodrigues, P.P.: Issues in evaluation of stream learning algorithms. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 329–338. ACM (2009)Google Scholar
- 10.He, H., Garcia, E.A.: Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 9, 1263–1284 (2008)Google Scholar
- 13.Krikorian, R.: New tweets per second record, and how!, August 2013. https://blog.twitter.com/engineering/en_us/a/2013/new-tweets-per-second-record-and-how.html. Accessed 12 Feb 2019
- 15.Street, W.N., Kim, Y.: A streaming ensemble algorithm (sea) for large-scale classification. In: Proceedings of the seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 377–382. ACM (2001)Google Scholar
- 16.Woźniak, M., Kasprzak, A.: Data stream classification using classifier ensemble. Schedae Informaticae 23, 21–32 (2015)Google Scholar