Abstract
Mining data streams is one of the most vital fields in the contemporary ML. Increasing number of real-world problems are characterized by both volume and velocity of data, as well as by evolving characteristics. Learning from data stream assumes that new instances arrive continuously and that their properties may change over time due to a phenomenon known as concept drift. In order to achieve good adaptation to such non-stationary problems, classifiers must not only be accurate and able to continuously accommodate new instances, but also be characterized by high speed and low computational costs. A very challenging subfield of this domain is imbalanced data stream mining. It combined difficulties from streaming and imbalanced data, as well as introduce a plethora of new ones. Algorithms designed for such scenarios must be flexible enough to quickly adapt to changing decision boundaries, imbalance ratios, and roles of classes. In this chapter we will discuss the basics of data stream mining methods, as well as review existing skew-insensitive algorithms. Background in data streams is given in Sect. 11.1. Section 11.2 discusses in-depth learning difficulties present in imbalanced data streams. Data-level and algorithm level methods for skewed data streams are discussed in Sect. 11.3, while ensemble learners are overview in Sect. 11.4. Section 11.5 concentrates on issue of emerging and disappearing classes, while Sect. 11.6 deals with the limited access to ground truth in streaming scenarios. Finally, Sect. 11.7 concludes this chapter and presents future challenges in the field of learning from imbalanced data streams.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Abdallah, Z.S., Gaber, M.M., Srinivasan, B., Krishnaswamy, S.: Anynovel: detection of novel concepts in evolving data streams. Evol. Syst. 7(2), 73–93 (2016)
Al-Khateeb, T., Masud, M.M., Al-Naami, K., Seker, S.E., Mustafa, A.M., Khan, L., Trabelsi, Z., Aggarwal, C.C., Han, J.: Recurring and novel class detection using class-based ensemble for evolving data stream. IEEE Trans. Knowl. Data Eng. 28(10), 2752–2764 (2016)
Barua, S., Islam, M.M., Murase, K.: GOS-IL: a generalized over-sampling based online imbalanced learning framework. In: Neural Information Processing – 22nd International Conference, ICONIP 2015, Proceedings, Part I, Istanbul, 9–12 Nov 2015, pp. 680–687 (2015)
Bifet, A., Holmes, G., Pfahringer, B., Frank, E.: Fast perceptron decision tree learning from evolving data streams. In: Advances in Knowledge Discovery and Data Mining, 14th Pacific-Asia Conference, PAKDD 2010, Proceedings. Part II, Hyderabad, 21–24 June 2010, pp. 299–310 (2010)
Brzezinski, D., Stefanowski, J.: Prequential AUC: properties of the area under the ROC curve for data streams with concept drift. Knowl. Inf. Syst. 52(2), 531–562 (2017)
Chen, S., He, H.: SERA: selectively recursive approach towards nonstationary imbalanced stream data mining. In: International Joint Conference on Neural Networks, IJCNN 2009, Atlanta, 14–19 June 2009, pp. 522–529 (2009)
Chen, S., He, H.: Towards incremental learning of nonstationary imbalanced data stream: a multiple selectively recursive approach. Evol. Syst. 2(1), 35–50 (2011)
Chu, W., Zinkevich, M., Li, L., Thomas, A., Tseng, B.L.: Unbiased online active learning in data streams. In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Diego, 21–24 Aug 2011, pp. 195–203 (2011)
Cieslak, D.A., Hoens, T.R., Chawla, N.V., Kegelmeyer, W.P.: Hellinger distance decision trees are robust and skew-insensitive. Data Min. Knowl. Discov. 24(1), 136–158 (2012)
Czarnecki, W.M., Tabor, J.: Online extreme entropy machines for streams classification and active learning. In: Proceedings of the 9th International Conference on Computer Recognition Systems CORES 2015, Wroclaw, 25–27 May 2015, pp. 371–381 (2015)
Ditzler, G., Polikar, R.: Incremental learning of concept drift from streaming imbalanced data. IEEE Trans. Knowl. Data Eng. 25(10), 2283–2301 (2013)
Domingos, P.M., Hulten, G.: Mining high-speed data streams. In: Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Boston, 20–23 Aug 2000, pp. 71–80 (2000)
Dyer, K.B., Capo, R., Polikar, R.: COMPOSE: a semisupervised learning framework for initially labeled nonstationary streaming data. IEEE Trans. Neural Netw. Learn. Syst. 25(1), 12–26 (2014)
Ferdowsi, Z., Ghani, R., Settimi, R.: Online active learning with imbalanced classes. In: 2013 IEEE 13th International Conference on Data Mining, Dallas, 7–10 Dec 2013, pp. 1043–1048 (2013)
Gaber, M.M.: Advances in data stream mining. Wiley Interdiscip. Rev. Data Min. Knowl. Disc. 2(1), 79–85 (2012)
Gama, J., Medas, P., Castillo, G., Rodrigues, P.P.: Learning with drift detection. In: Advances in Artificial Intelligence – SBIA 2004, Proceedings of the 17th Brazilian Symposium on Artificial Intelligence, São Luis, Maranhão, 29 Sept–1 Oct 2004. Lecture Notes in Computer Science 3171, Springer (2004). ISBN: 3-540-23237-0
Gama, J., Sebastião, R., Rodrigues, P.P.: On evaluating stream learning algorithms. Mach. Learn. 90(3), 317–346 (2013)
Gama, J., Zliobaite, I., Bifet, A., Pechenizkiy, M., Bouchachia, A.: A survey on concept drift adaptation. ACM Comput. Surv. 46(4), 44:1–44:37 (2014)
Gao, J., Ding, B., Fan, W., Han, J., Yu, P.S.: Classifying data streams with skewed class distributions and concept drifts. IEEE Internet Comput. 12(6), 37–49 (2008)
Ghazikhani, A., Monsefi, R., Yazdi, H.S.: Ensemble of online neural networks for non-stationary and imbalanced data streams. Neurocomputing 122, 535–544 (2013)
Ghazikhani, A., Monsefi, R., Yazdi, H.S.: Recursive least square perceptron model for non-stationary and imbalanced data stream classification. Evol. Syst. 4(2), 119–131 (2013)
Ghazikhani, A., Monsefi, R., Yazdi, H.S.: Online neural network model for non-stationary and imbalanced data stream classification. Int. J. Mach. Learn. Cybern. 5(1), 51–62 (2014)
Hoens, T.R., Polikar, R., Chawla, N.V.: Learning from streaming data with concept drift and imbalance: an overview. Prog. AI 1(1), 89–101 (2012)
Hu, J., Yang, H., King, I., Lyu, M.R., So, A.M.: Kernelized online imbalanced learning with fixed budgets. In: Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, Austin, 25–30 Jan 2015, pp. 2666–2672 (2015)
Khanchi, S., Heywood, M.I., Zincir-Heywood, A.N.: Properties of a GP active learning framework for streaming data with class imbalance. In: Proceedings of the Genetic and Evolutionary Computation Conference, GECCO 2017, Berlin, 15–19 July 2017, pp. 945–952 (2017)
Khanchi, S., Vahdat, A., Heywood, M.I., Zincir-Heywood, A.N.: On botnet detection with genetic programming under streaming data label budgets and class imbalance. Swarm Evol. Comput. 39, 123–140 (2018)
Kolter, J.Z., Maloof, M.A.: Dynamic weighted majority: an ensemble method for drifting concepts. J. Mach. Learn. Res. 8, 2755–2790 (2007)
Krawczyk, B.: Learning from imbalanced data: open challenges and future directions. Prog. AI 5(4), 221–232 (2016)
Krawczyk, B., Skryjomski, P.: Cost-sensitive perceptron decision trees for imbalanced drifting data streams. In: Machine Learning and Knowledge Discovery in Databases – European Conference, ECML PKDD 2017, Proceedings, Part II, Skopje, 18–22 Sept 2017, pp. 512–527 (2017)
Krawczyk, B., Minku, L.L., Gama, J., Stefanowski, J., Woźniak, M.: Ensemble learning for data stream analysis: a survey. Inf. Fusion 37, 132–156 (2017)
Kurlej, B., Woźniak, M.: Active learning approach to concept drift problem. Log. J. IGPL 20(3), 550–559 (2012)
Lichtenwalter, R., Chawla, N.V.: Adaptive methods for classification in arbitrarily imbalanced and drifting data streams. In: New Frontiers in Applied Data Mining, PAKDD 2009 International Workshops, Revised Selected Papers, Bangkok, 27–30 Apr 2009, pp. 53–75 (2009)
Liu, A., Lu, J., Liu, F., Zhang, G.: Accumulating regional density dissimilarity for concept drift detection in data streams. Pattern Recog. 76, 256–272 (2018)
Lu, Y., Cheung, Y., Tang, Y.Y.: Dynamic weighted majority for incremental learning of imbalanced data streams with concept drift. In: Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, IJCAI 2017, Melbourne, 19–25 Aug 2017, pp. 2393–2399 (2017)
Lughofer, E., Angelov, P.P.: Handling drifts and shifts in on-line data streams with evolving fuzzy systems. Appl. Soft Comput. 11(2), 2057–2068 (2011)
Lyon, R.J., Brooke, J.M., Knowles, J.D., Stappers, B.W.: Hellinger distance trees for imbalanced streams. In: 22nd International Conference on Pattern Recognition, ICPR 2014, Stockholm, 24–28 Aug 2014, pp. 1969–1974 (2014)
Mao, W., Wang, J., Wang, L.: Online sequential classification of imbalanced data by combining extreme learning machine and improved SMOTE algorithm. In: 2015 International Joint Conference on Neural Networks, IJCNN 2015, Killarney, 12–17 July 2015, pp. 1–8 (2015)
Mao, W., Jiang, M., Wang, J., Li, Y.: Online extreme learning machine with hybrid sampling strategy for sequential imbalanced data. Cogn. Comput. 9(6), 780–800 (2017)
Masud, M.M., Gao, J., Khan, L., Han, J., Thuraisingham, B.M.: Classification and novel class detection in concept-drifting data streams under time constraints. IEEE Trans. Knowl. Data Eng. 23(6), 859–874 (2011)
Masud, M.M., Woolam, C., Gao, J., Khan, L., Han, J., Hamlen, K.W., Oza, N.C.: Facing the reality of data stream classification: coping with scarcity of labeled data. Knowl. Inf. Syst. 33(1), 213–244 (2011)
Masud, M.M., Chen, Q., Khan, L., Aggarwal, C.C., Gao, J., Han, J., Srivastava, A.N., Oza, N.C.: Classification and adaptive novel class detection of feature-evolving data streams. IEEE Trans. Knowl. Data Eng. 25(7), 1484–1497 (2013)
Minku, L.L., Yao, X., White, A.P.: The impact of diversity on online ensemble learning in the presence of concept drift. IEEE Trans. Knowl. Data Eng. 22, 730–742 (2009)
Mirza, B., Lin, Z., Liu, N.: Ensemble of subset online sequential extreme learning machine for class imbalance and concept drift. Neurocomputing 149, 316–329 (2015)
Muhlbaier, M.D., Topalis, A., Polikar, R.: Learn++.nc: combining ensemble of classifiers with dynamically weighted consult-and-vote for efficient incremental learning of new classes. IEEE Trans. Neural Netw. 20(1), 152–168 (2009)
Nguyen, H.M., Cooper, E.W., Kamei, K.: Online learning from imbalanced data streams. In: Third International Conference of Soft Computing and Pattern Recognition, SoCPaR 2011, Dalian, 14–16 Oct 2011, pp. 347–352 (2011)
Pang, S., Zhu, L., Chen, G., Sarrafzadeh, A., Ban, T., Inoue, D.: Dynamic class imbalance learning for incremental LPSVM. Neural Netw. 44, 87–100 (2013)
Plasse, J., Adams, N.M.: Handling delayed labels in temporally evolving data streams. In: 2016 IEEE International Conference on Big Data, BigData 2016, Washington, DC, 5–8 Dec 2016, pp. 2416–2424 (2016)
Polikar, R., Upda, L., Upda, S.S., Honavar, V.G.: Learn++: an incremental learning algorithm for supervised neural networks. IEEE Trans. Syst. Man Cybern. Part C 31(4), 497–508 (2001)
Ren, S., Liao, B., Zhu, W., Li, Z., Liu, W., Li, K.: The gradual resampling ensemble for mining imbalanced data streams with concept drift. Neurocomputing 286, 150–166 (2018)
Rutkowski, L., Pietruczuk, L., Duda, P., Jaworski, M.: Decision trees for mining data streams based on the mcdiarmid’s bound. IEEE Trans. Knowl. Data Eng. 25(6), 1272–1279 (2013)
Settles, B.: Active learning literature survey. Technical report, University of Wisconsin-Madison (2010)
Sobolewski, P., Woźniak, M.: Concept drift detection and model selection with simulated recurrence and ensembles of statistical detectors. J. Univ. Comput. Sci. 19(4), 462–483 (2013)
Sun, Y., Tang, K., Minku, L.L., Wang, S., Yao, X.: Online ensemble learning of data streams with gradually evolved classes. IEEE Trans. Knowl. Data Eng. 28(6), 1532–1545 (2016)
Wang, H., Abraham, Z.: Concept drift detection for streaming data. In: 2015 International Joint Conference on Neural Networks, IJCNN 2015, Killarney, 12–17 July 2015, pp. 1–9 (2015)
Wang, S., Minku, L.L., Ghezzi, D., Caltabiano, D., Tiño, P., Yao, X.: Concept drift detection for online class imbalance learning. In: The 2013 International Joint Conference on Neural Networks, IJCNN 2013, Dallas, 4–9 Aug 2013, pp. 1–10 (2013)
Wang, S., Minku, L.L., Yao, X.: A learning framework for online class imbalance learning. In: Proceedings of the IEEE Symposium on Computational Intelligence and Ensemble Learning, CIEL 2013, IEEE Symposium Series on Computational Intelligence (SSCI), Singapore, 16–19 Apr 2013, pp. 36–45 (2013)
Wang, S., Minku, L.L., Yao, X.: Resampling-based ensemble methods for online class imbalance learning. IEEE Trans. Knowl. Data Eng. 27(5), 1356–1368 (2015)
Wang, S., Minku, L.L., Yao, X.: Dealing with multiple classes in online class imbalance learning. In: Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, IJCAI 2016, New York, 9–15 July 2016, pp. 2118–2124 (2016)
Wang, S., Minku, L.L., Yao, X.: A systematic study of online class imbalance learning with concept drift. IEEE Trans. Neural Netw. Learn. Syst. PP(99), 1–20 (2018). https://doi.org/10.1109/TNNLS.2017.2771290
Woźniak, M.: A hybrid decision tree training method using data streams. Knowl. Inf. Syst. 29(2), 335–347 (2011)
Woźniak, M., Graña, M., Corchado, E.: A survey of multiple classifier systems as hybrid systems. Inf. Fusion 16, 3–17 (2014)
Woźniak, M., Ksieniewicz, P., Cyganek, B., Kasprzak, A., Walkowiak, K.: Active learning classification of drifted streaming data. In: International Conference on Computational Science 2016, ICCS 2016, San Diego, 6–8 June 2016, pp. 1724–1733 (2016)
Yan, Y., Yang, T., Yang, Y., Chen, J.: A framework of online learning with imbalanced streaming data. In: Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, San Francisco, 4–9, Feb 2017, pp. 2817–2823 (2017)
Zhang, X., Yang, T., Srinivasan, P.: Online asymmetric active learning with imbalanced data. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, 13–17 Aug 2016, pp. 2055–2064 (2016)
Zhao, P., Hoi, S.C.H.: Cost-sensitive online active learning with application to malicious URL detection. In: The 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2013, Chicago, 11–14 Aug 2013, pp. 919–927 (2013)
Zhou, Z., Liu, X.: Training cost-sensitive neural networks with methods addressing the class imbalance problem. IEEE Trans. Knowl. Data Eng. 18(1), 63–77 (2006)
Žliobaitė, I., Bifet, A., Pfahringer, B., Holmes, G.: Active learning with drifting streaming data. IEEE Trans. Neural Netw. Learn. Syst. 1, 27–39 (2014)
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Fernández, A., García, S., Galar, M., Prati, R.C., Krawczyk, B., Herrera, F. (2018). Learning from Imbalanced Data Streams. In: Learning from Imbalanced Data Sets. Springer, Cham. https://doi.org/10.1007/978-3-319-98074-4_11
Download citation
DOI: https://doi.org/10.1007/978-3-319-98074-4_11
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-98073-7
Online ISBN: 978-3-319-98074-4
eBook Packages: Computer ScienceComputer Science (R0)