Skip to main content

Low-Dimensional Representation Learning from Imbalanced Data Streams

Part of the Lecture Notes in Computer Science book series (LNAI,volume 12712)

Abstract

Learning from data streams is among the contemporary challenges in the machine learning domain, which is frequently plagued by the class imbalance problem. In non-stationary environments, ratios among classes, as well as their roles (majority and minority) may change over time. The class imbalance is usually alleviated by balancing classes with resampling. However, this suffers from limitations, such as a lack of adaptation to concept drift and the possibility of shifting the true class distributions. In this paper, we propose a novel ensemble approach, where each new base classifier is built using a low-dimensional embedding. We use class-dependent entropy linear manifold to find the most discriminative low-dimensional representation that is, at the same time, skew-insensitive. This allows us to address two challenging issues: (i) learning efficient classifiers from imbalanced and drifting streams without data resampling; and (ii) tackling simultaneously high-dimensional and imbalanced streams that pose extreme challenges to existing classifiers. Our proposed low-dimensional representation algorithm is a flexible plug-in that can work with any ensemble learning algorithm, making it a highly useful tool for difficult scenarios of learning from high-dimensional imbalanced and drifting data streams.

Keywords

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Anupama, N., Jena, S.: A novel approach using incremental oversampling for data stream mining. Evol. Syst. 10(3), 351–362 (2019)

    Article  Google Scholar 

  2. Bonab, H.R., Can, F.: GOOWE: geometrically optimum and online-weighted ensemble classifier for evolving data streams. ACM TKDD 12(2), 25:1–25:33 (2018)

    Google Scholar 

  3. Brzezinski, D., Stefanowski, J.: Reacting to different types of concept drift: the accuracy updated ensemble algorithm. IEEE Trans. Neural Netw. Learning Syst. 25(1), 81–94 (2014)

    Article  Google Scholar 

  4. Cano, A., Krawczyk, B.: Kappa updated ensemble for drifting data stream mining. Mach. Learn. 109(1), 175–218 (2020)

    Article  MathSciNet  Google Scholar 

  5. Czarnecki, W.M., Józefowicz, R., Tabor, J.: Maximum entropy linear manifold for learning discriminative low-dimensional representation. In: Machine Learning and Knowledge Discovery in Databases - European Conference, ECML PKDD 2015, Porto, Portugal, September 7–11, 2015, Proceedings, Part I. pp. 52–67 (2015)

    Google Scholar 

  6. Czarnecki, W.M., Tabor, J.: Multithreshold entropy linear classifier: theory and applications. Expert Syst. Appl. 42(13), 5591–5606 (2015)

    Article  Google Scholar 

  7. Fernández, A., García, S., Galar, M., Prati, R.C., Krawczyk, B., Herrera, F.: Learning from Imbalanced Data Sets. Springer (2018). 10.1007/978-3-319-98074-4

    Google Scholar 

  8. Gomes, H.M., Bifet, A., Read, J., Barddal, J.P., Enembreck, F., Pfharinger, B., Holmes, G., Abdessalem, T.: Adaptive random forests for evolving data stream classification. Mach. Learn. 106(9–10), 1469–1495 (2017)

    Article  MathSciNet  Google Scholar 

  9. Karampatziakis, N., Mineiro, P.: Discriminative features via generalized eigenvectors. In: Proceedings of the 31th International Conference on Machine Learning, ICML 2014, Beijing, China, 21–26 June 2014, pp. 494–502 (2014)

    Google Scholar 

  10. Krawczyk, B., Minku, L.L., Gama, J., Stefanowski, J., Woźniak, M.: Ensemble learning for data stream analysis: A survey. Inform. Fus. 37, 132–156 (2017)

    Article  Google Scholar 

  11. Li, Z., Liu, J., Lu, H.: Structure preserving non-negative matrix factorization for dimensionality reduction. Comput. Vis. Image Underst. 117(9), 1175–1189 (2013)

    Article  Google Scholar 

  12. Ren, S., Zhu, W., Liao, B., Li, Z., Wang, P., Li, K., Chen, M., Li, Z.: Selection-based resampling ensemble algorithm for nonstationary imbalanced stream data learning. Knowl.-Based Syst. 163, 705–722 (2019)

    Google Scholar 

  13. Wang, S., Minku, L.L., Yao, X.: A systematic study of online class imbalance learning with concept drift. IEEE Trans. Neural Netw. Learn. Syst. 29(10), 4802–4821 (2018)

    Article  Google Scholar 

  14. Wang, Y., Ramanan, D., Hebert, M.: Learning to model the tail. In: Guyon, I., von Luxburg, U., Bengio, S., Wallach, H.M., Fergus, R., Vishwanathan, S.V.N., Garnett, R. (eds.) Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4–9, 2017, Long Beach, CA, USA. pp. 7029–7039 (2017)

    Google Scholar 

  15. Wang, Z., Kong, Z., Chandra, S., Tao, H., Khan, L.: Robust high dimensional stream classification with novel class detection. In: 35th IEEE International Conference on Data Engineering, ICDE 2019, Macao, China, April 8–11, 2019. pp. 1418–1429 (2019)

    Google Scholar 

  16. Yan, Y., Yang, T., Yang, Y., Chen, J.: A framework of online learning with imbalanced streaming data. In: Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, February 4–9, 2017, San Francisco, California, USA. pp. 2817–2823 (2017)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bartosz Krawczyk .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Korycki, Ł., Krawczyk, B. (2021). Low-Dimensional Representation Learning from Imbalanced Data Streams. In: Karlapalem, K., et al. Advances in Knowledge Discovery and Data Mining. PAKDD 2021. Lecture Notes in Computer Science(), vol 12712. Springer, Cham. https://doi.org/10.1007/978-3-030-75762-5_50

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-75762-5_50

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-75761-8

  • Online ISBN: 978-3-030-75762-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics