Skip to main content

Learning from Imbalanced Data Streams

  • Chapter
  • First Online:
Learning from Imbalanced Data Sets

Abstract

Mining data streams is one of the most vital fields in the contemporary ML. Increasing number of real-world problems are characterized by both volume and velocity of data, as well as by evolving characteristics. Learning from data stream assumes that new instances arrive continuously and that their properties may change over time due to a phenomenon known as concept drift. In order to achieve good adaptation to such non-stationary problems, classifiers must not only be accurate and able to continuously accommodate new instances, but also be characterized by high speed and low computational costs. A very challenging subfield of this domain is imbalanced data stream mining. It combined difficulties from streaming and imbalanced data, as well as introduce a plethora of new ones. Algorithms designed for such scenarios must be flexible enough to quickly adapt to changing decision boundaries, imbalance ratios, and roles of classes. In this chapter we will discuss the basics of data stream mining methods, as well as review existing skew-insensitive algorithms. Background in data streams is given in Sect. 11.1. Section 11.2 discusses in-depth learning difficulties present in imbalanced data streams. Data-level and algorithm level methods for skewed data streams are discussed in Sect. 11.3, while ensemble learners are overview in Sect. 11.4. Section 11.5 concentrates on issue of emerging and disappearing classes, while Sect. 11.6 deals with the limited access to ground truth in streaming scenarios. Finally, Sect. 11.7 concludes this chapter and presents future challenges in the field of learning from imbalanced data streams.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Abdallah, Z.S., Gaber, M.M., Srinivasan, B., Krishnaswamy, S.: Anynovel: detection of novel concepts in evolving data streams. Evol. Syst. 7(2), 73–93 (2016)

    Article  Google Scholar 

  2. Al-Khateeb, T., Masud, M.M., Al-Naami, K., Seker, S.E., Mustafa, A.M., Khan, L., Trabelsi, Z., Aggarwal, C.C., Han, J.: Recurring and novel class detection using class-based ensemble for evolving data stream. IEEE Trans. Knowl. Data Eng. 28(10), 2752–2764 (2016)

    Article  Google Scholar 

  3. Barua, S., Islam, M.M., Murase, K.: GOS-IL: a generalized over-sampling based online imbalanced learning framework. In: Neural Information Processing – 22nd International Conference, ICONIP 2015, Proceedings, Part I, Istanbul, 9–12 Nov 2015, pp. 680–687 (2015)

    Chapter  Google Scholar 

  4. Bifet, A., Holmes, G., Pfahringer, B., Frank, E.: Fast perceptron decision tree learning from evolving data streams. In: Advances in Knowledge Discovery and Data Mining, 14th Pacific-Asia Conference, PAKDD 2010, Proceedings. Part II, Hyderabad, 21–24 June 2010, pp. 299–310 (2010)

    Chapter  Google Scholar 

  5. Brzezinski, D., Stefanowski, J.: Prequential AUC: properties of the area under the ROC curve for data streams with concept drift. Knowl. Inf. Syst. 52(2), 531–562 (2017)

    Article  Google Scholar 

  6. Chen, S., He, H.: SERA: selectively recursive approach towards nonstationary imbalanced stream data mining. In: International Joint Conference on Neural Networks, IJCNN 2009, Atlanta, 14–19 June 2009, pp. 522–529 (2009)

    Google Scholar 

  7. Chen, S., He, H.: Towards incremental learning of nonstationary imbalanced data stream: a multiple selectively recursive approach. Evol. Syst. 2(1), 35–50 (2011)

    Article  Google Scholar 

  8. Chu, W., Zinkevich, M., Li, L., Thomas, A., Tseng, B.L.: Unbiased online active learning in data streams. In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Diego, 21–24 Aug 2011, pp. 195–203 (2011)

    Google Scholar 

  9. Cieslak, D.A., Hoens, T.R., Chawla, N.V., Kegelmeyer, W.P.: Hellinger distance decision trees are robust and skew-insensitive. Data Min. Knowl. Discov. 24(1), 136–158 (2012)

    Article  MathSciNet  Google Scholar 

  10. Czarnecki, W.M., Tabor, J.: Online extreme entropy machines for streams classification and active learning. In: Proceedings of the 9th International Conference on Computer Recognition Systems CORES 2015, Wroclaw, 25–27 May 2015, pp. 371–381 (2015)

    Google Scholar 

  11. Ditzler, G., Polikar, R.: Incremental learning of concept drift from streaming imbalanced data. IEEE Trans. Knowl. Data Eng. 25(10), 2283–2301 (2013)

    Article  Google Scholar 

  12. Domingos, P.M., Hulten, G.: Mining high-speed data streams. In: Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Boston, 20–23 Aug 2000, pp. 71–80 (2000)

    Google Scholar 

  13. Dyer, K.B., Capo, R., Polikar, R.: COMPOSE: a semisupervised learning framework for initially labeled nonstationary streaming data. IEEE Trans. Neural Netw. Learn. Syst. 25(1), 12–26 (2014)

    Article  Google Scholar 

  14. Ferdowsi, Z., Ghani, R., Settimi, R.: Online active learning with imbalanced classes. In: 2013 IEEE 13th International Conference on Data Mining, Dallas, 7–10 Dec 2013, pp. 1043–1048 (2013)

    Google Scholar 

  15. Gaber, M.M.: Advances in data stream mining. Wiley Interdiscip. Rev. Data Min. Knowl. Disc. 2(1), 79–85 (2012)

    Article  Google Scholar 

  16. Gama, J., Medas, P., Castillo, G., Rodrigues, P.P.: Learning with drift detection. In: Advances in Artificial Intelligence – SBIA 2004, Proceedings of the 17th Brazilian Symposium on Artificial Intelligence, São Luis, Maranhão, 29 Sept–1 Oct 2004. Lecture Notes in Computer Science 3171, Springer (2004). ISBN: 3-540-23237-0

    Chapter  Google Scholar 

  17. Gama, J., Sebastião, R., Rodrigues, P.P.: On evaluating stream learning algorithms. Mach. Learn. 90(3), 317–346 (2013)

    Article  MathSciNet  Google Scholar 

  18. Gama, J., Zliobaite, I., Bifet, A., Pechenizkiy, M., Bouchachia, A.: A survey on concept drift adaptation. ACM Comput. Surv. 46(4), 44:1–44:37 (2014)

    Article  Google Scholar 

  19. Gao, J., Ding, B., Fan, W., Han, J., Yu, P.S.: Classifying data streams with skewed class distributions and concept drifts. IEEE Internet Comput. 12(6), 37–49 (2008)

    Article  Google Scholar 

  20. Ghazikhani, A., Monsefi, R., Yazdi, H.S.: Ensemble of online neural networks for non-stationary and imbalanced data streams. Neurocomputing 122, 535–544 (2013)

    Article  Google Scholar 

  21. Ghazikhani, A., Monsefi, R., Yazdi, H.S.: Recursive least square perceptron model for non-stationary and imbalanced data stream classification. Evol. Syst. 4(2), 119–131 (2013)

    Article  Google Scholar 

  22. Ghazikhani, A., Monsefi, R., Yazdi, H.S.: Online neural network model for non-stationary and imbalanced data stream classification. Int. J. Mach. Learn. Cybern. 5(1), 51–62 (2014)

    Article  Google Scholar 

  23. Hoens, T.R., Polikar, R., Chawla, N.V.: Learning from streaming data with concept drift and imbalance: an overview. Prog. AI 1(1), 89–101 (2012)

    Google Scholar 

  24. Hu, J., Yang, H., King, I., Lyu, M.R., So, A.M.: Kernelized online imbalanced learning with fixed budgets. In: Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, Austin, 25–30 Jan 2015, pp. 2666–2672 (2015)

    Google Scholar 

  25. Khanchi, S., Heywood, M.I., Zincir-Heywood, A.N.: Properties of a GP active learning framework for streaming data with class imbalance. In: Proceedings of the Genetic and Evolutionary Computation Conference, GECCO 2017, Berlin, 15–19 July 2017, pp. 945–952 (2017)

    Google Scholar 

  26. Khanchi, S., Vahdat, A., Heywood, M.I., Zincir-Heywood, A.N.: On botnet detection with genetic programming under streaming data label budgets and class imbalance. Swarm Evol. Comput. 39, 123–140 (2018)

    Article  Google Scholar 

  27. Kolter, J.Z., Maloof, M.A.: Dynamic weighted majority: an ensemble method for drifting concepts. J. Mach. Learn. Res. 8, 2755–2790 (2007)

    MATH  Google Scholar 

  28. Krawczyk, B.: Learning from imbalanced data: open challenges and future directions. Prog. AI 5(4), 221–232 (2016)

    Google Scholar 

  29. Krawczyk, B., Skryjomski, P.: Cost-sensitive perceptron decision trees for imbalanced drifting data streams. In: Machine Learning and Knowledge Discovery in Databases – European Conference, ECML PKDD 2017, Proceedings, Part II, Skopje, 18–22 Sept 2017, pp. 512–527 (2017)

    Chapter  Google Scholar 

  30. Krawczyk, B., Minku, L.L., Gama, J., Stefanowski, J., Woźniak, M.: Ensemble learning for data stream analysis: a survey. Inf. Fusion 37, 132–156 (2017)

    Article  Google Scholar 

  31. Kurlej, B., Woźniak, M.: Active learning approach to concept drift problem. Log. J. IGPL 20(3), 550–559 (2012)

    Article  MathSciNet  Google Scholar 

  32. Lichtenwalter, R., Chawla, N.V.: Adaptive methods for classification in arbitrarily imbalanced and drifting data streams. In: New Frontiers in Applied Data Mining, PAKDD 2009 International Workshops, Revised Selected Papers, Bangkok, 27–30 Apr 2009, pp. 53–75 (2009)

    Google Scholar 

  33. Liu, A., Lu, J., Liu, F., Zhang, G.: Accumulating regional density dissimilarity for concept drift detection in data streams. Pattern Recog. 76, 256–272 (2018)

    Article  Google Scholar 

  34. Lu, Y., Cheung, Y., Tang, Y.Y.: Dynamic weighted majority for incremental learning of imbalanced data streams with concept drift. In: Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, IJCAI 2017, Melbourne, 19–25 Aug 2017, pp. 2393–2399 (2017)

    Google Scholar 

  35. Lughofer, E., Angelov, P.P.: Handling drifts and shifts in on-line data streams with evolving fuzzy systems. Appl. Soft Comput. 11(2), 2057–2068 (2011)

    Article  Google Scholar 

  36. Lyon, R.J., Brooke, J.M., Knowles, J.D., Stappers, B.W.: Hellinger distance trees for imbalanced streams. In: 22nd International Conference on Pattern Recognition, ICPR 2014, Stockholm, 24–28 Aug 2014, pp. 1969–1974 (2014)

    Google Scholar 

  37. Mao, W., Wang, J., Wang, L.: Online sequential classification of imbalanced data by combining extreme learning machine and improved SMOTE algorithm. In: 2015 International Joint Conference on Neural Networks, IJCNN 2015, Killarney, 12–17 July 2015, pp. 1–8 (2015)

    Google Scholar 

  38. Mao, W., Jiang, M., Wang, J., Li, Y.: Online extreme learning machine with hybrid sampling strategy for sequential imbalanced data. Cogn. Comput. 9(6), 780–800 (2017)

    Article  Google Scholar 

  39. Masud, M.M., Gao, J., Khan, L., Han, J., Thuraisingham, B.M.: Classification and novel class detection in concept-drifting data streams under time constraints. IEEE Trans. Knowl. Data Eng. 23(6), 859–874 (2011)

    Article  Google Scholar 

  40. Masud, M.M., Woolam, C., Gao, J., Khan, L., Han, J., Hamlen, K.W., Oza, N.C.: Facing the reality of data stream classification: coping with scarcity of labeled data. Knowl. Inf. Syst. 33(1), 213–244 (2011)

    Article  Google Scholar 

  41. Masud, M.M., Chen, Q., Khan, L., Aggarwal, C.C., Gao, J., Han, J., Srivastava, A.N., Oza, N.C.: Classification and adaptive novel class detection of feature-evolving data streams. IEEE Trans. Knowl. Data Eng. 25(7), 1484–1497 (2013)

    Article  Google Scholar 

  42. Minku, L.L., Yao, X., White, A.P.: The impact of diversity on online ensemble learning in the presence of concept drift. IEEE Trans. Knowl. Data Eng. 22, 730–742 (2009)

    Article  Google Scholar 

  43. Mirza, B., Lin, Z., Liu, N.: Ensemble of subset online sequential extreme learning machine for class imbalance and concept drift. Neurocomputing 149, 316–329 (2015)

    Article  Google Scholar 

  44. Muhlbaier, M.D., Topalis, A., Polikar, R.: Learn++.nc: combining ensemble of classifiers with dynamically weighted consult-and-vote for efficient incremental learning of new classes. IEEE Trans. Neural Netw. 20(1), 152–168 (2009)

    Article  Google Scholar 

  45. Nguyen, H.M., Cooper, E.W., Kamei, K.: Online learning from imbalanced data streams. In: Third International Conference of Soft Computing and Pattern Recognition, SoCPaR 2011, Dalian, 14–16 Oct 2011, pp. 347–352 (2011)

    Google Scholar 

  46. Pang, S., Zhu, L., Chen, G., Sarrafzadeh, A., Ban, T., Inoue, D.: Dynamic class imbalance learning for incremental LPSVM. Neural Netw. 44, 87–100 (2013)

    Article  Google Scholar 

  47. Plasse, J., Adams, N.M.: Handling delayed labels in temporally evolving data streams. In: 2016 IEEE International Conference on Big Data, BigData 2016, Washington, DC, 5–8 Dec 2016, pp. 2416–2424 (2016)

    Google Scholar 

  48. Polikar, R., Upda, L., Upda, S.S., Honavar, V.G.: Learn++: an incremental learning algorithm for supervised neural networks. IEEE Trans. Syst. Man Cybern. Part C 31(4), 497–508 (2001)

    Article  Google Scholar 

  49. Ren, S., Liao, B., Zhu, W., Li, Z., Liu, W., Li, K.: The gradual resampling ensemble for mining imbalanced data streams with concept drift. Neurocomputing 286, 150–166 (2018)

    Article  Google Scholar 

  50. Rutkowski, L., Pietruczuk, L., Duda, P., Jaworski, M.: Decision trees for mining data streams based on the mcdiarmid’s bound. IEEE Trans. Knowl. Data Eng. 25(6), 1272–1279 (2013)

    Article  Google Scholar 

  51. Settles, B.: Active learning literature survey. Technical report, University of Wisconsin-Madison (2010)

    MATH  Google Scholar 

  52. Sobolewski, P., Woźniak, M.: Concept drift detection and model selection with simulated recurrence and ensembles of statistical detectors. J. Univ. Comput. Sci. 19(4), 462–483 (2013)

    Google Scholar 

  53. Sun, Y., Tang, K., Minku, L.L., Wang, S., Yao, X.: Online ensemble learning of data streams with gradually evolved classes. IEEE Trans. Knowl. Data Eng. 28(6), 1532–1545 (2016)

    Article  Google Scholar 

  54. Wang, H., Abraham, Z.: Concept drift detection for streaming data. In: 2015 International Joint Conference on Neural Networks, IJCNN 2015, Killarney, 12–17 July 2015, pp. 1–9 (2015)

    Google Scholar 

  55. Wang, S., Minku, L.L., Ghezzi, D., Caltabiano, D., Tiño, P., Yao, X.: Concept drift detection for online class imbalance learning. In: The 2013 International Joint Conference on Neural Networks, IJCNN 2013, Dallas, 4–9 Aug 2013, pp. 1–10 (2013)

    Google Scholar 

  56. Wang, S., Minku, L.L., Yao, X.: A learning framework for online class imbalance learning. In: Proceedings of the IEEE Symposium on Computational Intelligence and Ensemble Learning, CIEL 2013, IEEE Symposium Series on Computational Intelligence (SSCI), Singapore, 16–19 Apr 2013, pp. 36–45 (2013)

    Google Scholar 

  57. Wang, S., Minku, L.L., Yao, X.: Resampling-based ensemble methods for online class imbalance learning. IEEE Trans. Knowl. Data Eng. 27(5), 1356–1368 (2015)

    Article  Google Scholar 

  58. Wang, S., Minku, L.L., Yao, X.: Dealing with multiple classes in online class imbalance learning. In: Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, IJCAI 2016, New York, 9–15 July 2016, pp. 2118–2124 (2016)

    Google Scholar 

  59. Wang, S., Minku, L.L., Yao, X.: A systematic study of online class imbalance learning with concept drift. IEEE Trans. Neural Netw. Learn. Syst. PP(99), 1–20 (2018). https://doi.org/10.1109/TNNLS.2017.2771290

    Article  Google Scholar 

  60. Woźniak, M.: A hybrid decision tree training method using data streams. Knowl. Inf. Syst. 29(2), 335–347 (2011)

    Article  Google Scholar 

  61. Woźniak, M., Graña, M., Corchado, E.: A survey of multiple classifier systems as hybrid systems. Inf. Fusion 16, 3–17 (2014)

    Article  Google Scholar 

  62. Woźniak, M., Ksieniewicz, P., Cyganek, B., Kasprzak, A., Walkowiak, K.: Active learning classification of drifted streaming data. In: International Conference on Computational Science 2016, ICCS 2016, San Diego, 6–8 June 2016, pp. 1724–1733 (2016)

    Google Scholar 

  63. Yan, Y., Yang, T., Yang, Y., Chen, J.: A framework of online learning with imbalanced streaming data. In: Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, San Francisco, 4–9, Feb 2017, pp. 2817–2823 (2017)

    Google Scholar 

  64. Zhang, X., Yang, T., Srinivasan, P.: Online asymmetric active learning with imbalanced data. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, 13–17 Aug 2016, pp. 2055–2064 (2016)

    Google Scholar 

  65. Zhao, P., Hoi, S.C.H.: Cost-sensitive online active learning with application to malicious URL detection. In: The 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2013, Chicago, 11–14 Aug 2013, pp. 919–927 (2013)

    Google Scholar 

  66. Zhou, Z., Liu, X.: Training cost-sensitive neural networks with methods addressing the class imbalance problem. IEEE Trans. Knowl. Data Eng. 18(1), 63–77 (2006)

    Article  Google Scholar 

  67. Žliobaitė, I., Bifet, A., Pfahringer, B., Holmes, G.: Active learning with drifting streaming data. IEEE Trans. Neural Netw. Learn. Syst. 1, 27–39 (2014)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Fernández, A., García, S., Galar, M., Prati, R.C., Krawczyk, B., Herrera, F. (2018). Learning from Imbalanced Data Streams. In: Learning from Imbalanced Data Sets. Springer, Cham. https://doi.org/10.1007/978-3-319-98074-4_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-98074-4_11

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-98073-7

  • Online ISBN: 978-3-319-98074-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics