Skip to main content

Stream-Based Active Learning with Verification Latency in Non-stationary Environments

  • Conference paper
  • First Online:
Artificial Neural Networks and Machine Learning – ICANN 2022 (ICANN 2022)

Abstract

Data stream classification is an important problem in the field of machine learning. Due to the non-stationary nature of the data where the underlying distribution changes over time (concept drift), the model needs to continuously adapt to new data statistics. Stream-based Active Learning (AL) approaches address this problem by interactively querying a human expert to provide new data labels for the most recent samples, within a limited budget. Existing AL strategies assume that labels are immediately available, while in a real-world scenario the expert requires time to provide a queried label (verification latency), and by the time the requested labels arrive they may not be relevant anymore. In this article, we investigate the influence of finite, time-variable, and unknown verification delay, in the presence of concept drift on AL approaches. We propose PRopagate (PR), a latency independent utility estimator which also predicts the requested, but not yet known, labels. Furthermore, we propose a drift-dependent dynamic budget strategy, which uses a variable distribution of the labelling budget over time, after a detected drift. Thorough experimental evaluation, with both synthetic and real-world non-stationary datasets, and different settings of verification latency and budget are conducted and analyzed. We empirically show that the proposed method consistently outperforms the state-of-the-art. Additionally, we demonstrate that with variable budget allocation in time, it is possible to boost the performance of AL strategies, without increasing the overall labeling budget.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Bifet, A., Gavalda, R.: Learning from time-changing data with adaptive windowing. In: SIAM International Conference on Data Mining (2007)

    Google Scholar 

  2. Castellani, A., Schmitt, S., Hammer, B.: Task-sensitive concept drift detector with constraint embedding. In: IEEE Symposium Series on Computational Intelligence (SSCI) (2021)

    Google Scholar 

  3. Demšar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)

    MathSciNet  MATH  Google Scholar 

  4. Ditzler, G., Polikar, R.: Hellinger distance based drift detection for nonstationary environments. In: 2011 IEEE Symposium on Computational Intelligence in Dynamic and Uncertain Environments (CIDUE) (2011)

    Google Scholar 

  5. Ditzler, G., Polikar, R.: Incremental learning of concept drift from streaming imbalanced data. IEEE Trans. Knowl. Data Eng. 25, 2283–2301 (2013)

    Article  Google Scholar 

  6. Fahy, C., Yang, S., Gongora, M.: Scarcity of labels in non-stationary data streams: a survey. ACM Comput. Surv. (CSUR) 55(2), 1–39 (2022)

    Article  Google Scholar 

  7. Gama, J., Medas, P., Castillo, G., Rodrigues, P.: Learning with drift detection. In: Bazzan, A.L.C., Labidi, S. (eds.) SBIA 2004. LNCS (LNAI), vol. 3171, pp. 286–295. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-28645-5_29

    Chapter  Google Scholar 

  8. Gama, J., Sebastião, R., Rodrigues, P.P.: Issues in evaluation of stream learning algorithms. In: KDD (2009)

    Google Scholar 

  9. Gama, J., Žliobaitė, I., Bifet, A., Pechenizkiy, M., Bouchachia, A.: A survey on concept drift adaptation. ACM Comput. Surv. (CSUR) 46, 1–37 (2014)

    Article  Google Scholar 

  10. Gomes, H.M., Read, J., Bifet, A., Barddal, J.P., Gama, J.: Machine learning for streaming data: state of the art, challenges, and opportunities. SIGKDD Explor. 21, 6–22 (2019)

    Article  Google Scholar 

  11. Kottke, D., Krempl, G., Spiliopoulou, M.: Probabilistic active learning in datastreams. In: Fromont, E., De Bie, T., van Leeuwen, M. (eds.) IDA 2015. LNCS, vol. 9385, pp. 145–157. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24465-5_13

    Chapter  Google Scholar 

  12. Krawczyk, B., Pfahringer, B., Wozniak, M.: Combining active learning with concept drift detection for data stream mining. In: 2018 IEEE International Conference on Big Data (2018)

    Google Scholar 

  13. Kuncheva, L.I., Sánchez, J.S.: Nearest neighbour classifiers for streaming data with delayed labelling. In: 2008 IEEE International Conference on Data Mining (2008)

    Google Scholar 

  14. Marrs, G.R., Hickey, R.J., Black, M.M.: The impact of latency on online classification learning with concept drift. In: Bi, Y., Williams, M.-A. (eds.) KSEM 2010. LNCS (LNAI), vol. 6291, pp. 459–469. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15280-1_42

    Chapter  Google Scholar 

  15. McKnight, P.E., Najab, J.: Mann-Whitney u test. The Corsini encyclopedia of psychology (2010)

    Google Scholar 

  16. Mohamad, S., Mouchaweh, M.S., Bouchachia, A.: Active learning for data streams under concept drift and concept evolution. In: STREAMEVOLV@ECML-PKDD (2016)

    Google Scholar 

  17. Parreira, P.H., Prati, R.C.: Naive importance weighting for data stream with intermediate latency. In: 2021 IEEE Symposium Series on Computational Intelligence (SSCI) (2021)

    Google Scholar 

  18. Pham, T., Kottke, D., Krempl, G., Sick, B.: Stream-based active learning for sliding windows under the influence of verification latency. Mach. Learn. 1–26 (2021). https://doi.org/10.1007/s10994-021-06099-z

  19. Plasse, J., Adams, N.M.: Handling delayed labels in temporally evolving data streams. In: 2016 IEEE International Conference on Big Data (2016)

    Google Scholar 

  20. Serrao, E., Spiliopoulou, M.: Active stream learning with an oracle of unknown availability for sentiment prediction. In: IAL@PKDD/ECML (2018)

    Google Scholar 

  21. Umer, M., Polikar, R.: Comparative analysis of extreme verification latency learning algorithms. ArXiv abs/2011.14917 (2020)

  22. Žliobaitė, I.: Change with delayed labeling: when is it detectable? In: 2010 IEEE International Conference on Data Mining Workshops (2010)

    Google Scholar 

  23. Žliobaitė, I.: Combining similarity in time and space for training set formation under concept drift. Intell. Data Anal. 15, 589–611 (2011)

    Article  Google Scholar 

  24. Žliobaitė, I., Bifet, A., Pfahringer, B., Holmes, G.: Active learning with drifting streaming data. IEEE Trans. Neural Netw. Learn. Syst. 25, 27–39 (2014)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Andrea Castellani .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Castellani, A., Schmitt, S., Hammer, B. (2022). Stream-Based Active Learning with Verification Latency in Non-stationary Environments. In: Pimenidis, E., Angelov, P., Jayne, C., Papaleonidas, A., Aydin, M. (eds) Artificial Neural Networks and Machine Learning – ICANN 2022. ICANN 2022. Lecture Notes in Computer Science, vol 13532. Springer, Cham. https://doi.org/10.1007/978-3-031-15937-4_22

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-15937-4_22

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-15936-7

  • Online ISBN: 978-3-031-15937-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics