Abstract
Data stream classification is an important problem in the field of machine learning. Due to the non-stationary nature of the data where the underlying distribution changes over time (concept drift), the model needs to continuously adapt to new data statistics. Stream-based Active Learning (AL) approaches address this problem by interactively querying a human expert to provide new data labels for the most recent samples, within a limited budget. Existing AL strategies assume that labels are immediately available, while in a real-world scenario the expert requires time to provide a queried label (verification latency), and by the time the requested labels arrive they may not be relevant anymore. In this article, we investigate the influence of finite, time-variable, and unknown verification delay, in the presence of concept drift on AL approaches. We propose PRopagate (PR), a latency independent utility estimator which also predicts the requested, but not yet known, labels. Furthermore, we propose a drift-dependent dynamic budget strategy, which uses a variable distribution of the labelling budget over time, after a detected drift. Thorough experimental evaluation, with both synthetic and real-world non-stationary datasets, and different settings of verification latency and budget are conducted and analyzed. We empirically show that the proposed method consistently outperforms the state-of-the-art. Additionally, we demonstrate that with variable budget allocation in time, it is possible to boost the performance of AL strategies, without increasing the overall labeling budget.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Bifet, A., Gavalda, R.: Learning from time-changing data with adaptive windowing. In: SIAM International Conference on Data Mining (2007)
Castellani, A., Schmitt, S., Hammer, B.: Task-sensitive concept drift detector with constraint embedding. In: IEEE Symposium Series on Computational Intelligence (SSCI) (2021)
Demšar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)
Ditzler, G., Polikar, R.: Hellinger distance based drift detection for nonstationary environments. In: 2011 IEEE Symposium on Computational Intelligence in Dynamic and Uncertain Environments (CIDUE) (2011)
Ditzler, G., Polikar, R.: Incremental learning of concept drift from streaming imbalanced data. IEEE Trans. Knowl. Data Eng. 25, 2283–2301 (2013)
Fahy, C., Yang, S., Gongora, M.: Scarcity of labels in non-stationary data streams: a survey. ACM Comput. Surv. (CSUR) 55(2), 1–39 (2022)
Gama, J., Medas, P., Castillo, G., Rodrigues, P.: Learning with drift detection. In: Bazzan, A.L.C., Labidi, S. (eds.) SBIA 2004. LNCS (LNAI), vol. 3171, pp. 286–295. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-28645-5_29
Gama, J., Sebastião, R., Rodrigues, P.P.: Issues in evaluation of stream learning algorithms. In: KDD (2009)
Gama, J., Žliobaitė, I., Bifet, A., Pechenizkiy, M., Bouchachia, A.: A survey on concept drift adaptation. ACM Comput. Surv. (CSUR) 46, 1–37 (2014)
Gomes, H.M., Read, J., Bifet, A., Barddal, J.P., Gama, J.: Machine learning for streaming data: state of the art, challenges, and opportunities. SIGKDD Explor. 21, 6–22 (2019)
Kottke, D., Krempl, G., Spiliopoulou, M.: Probabilistic active learning in datastreams. In: Fromont, E., De Bie, T., van Leeuwen, M. (eds.) IDA 2015. LNCS, vol. 9385, pp. 145–157. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24465-5_13
Krawczyk, B., Pfahringer, B., Wozniak, M.: Combining active learning with concept drift detection for data stream mining. In: 2018 IEEE International Conference on Big Data (2018)
Kuncheva, L.I., Sánchez, J.S.: Nearest neighbour classifiers for streaming data with delayed labelling. In: 2008 IEEE International Conference on Data Mining (2008)
Marrs, G.R., Hickey, R.J., Black, M.M.: The impact of latency on online classification learning with concept drift. In: Bi, Y., Williams, M.-A. (eds.) KSEM 2010. LNCS (LNAI), vol. 6291, pp. 459–469. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15280-1_42
McKnight, P.E., Najab, J.: Mann-Whitney u test. The Corsini encyclopedia of psychology (2010)
Mohamad, S., Mouchaweh, M.S., Bouchachia, A.: Active learning for data streams under concept drift and concept evolution. In: STREAMEVOLV@ECML-PKDD (2016)
Parreira, P.H., Prati, R.C.: Naive importance weighting for data stream with intermediate latency. In: 2021 IEEE Symposium Series on Computational Intelligence (SSCI) (2021)
Pham, T., Kottke, D., Krempl, G., Sick, B.: Stream-based active learning for sliding windows under the influence of verification latency. Mach. Learn. 1–26 (2021). https://doi.org/10.1007/s10994-021-06099-z
Plasse, J., Adams, N.M.: Handling delayed labels in temporally evolving data streams. In: 2016 IEEE International Conference on Big Data (2016)
Serrao, E., Spiliopoulou, M.: Active stream learning with an oracle of unknown availability for sentiment prediction. In: IAL@PKDD/ECML (2018)
Umer, M., Polikar, R.: Comparative analysis of extreme verification latency learning algorithms. ArXiv abs/2011.14917 (2020)
Žliobaitė, I.: Change with delayed labeling: when is it detectable? In: 2010 IEEE International Conference on Data Mining Workshops (2010)
Žliobaitė, I.: Combining similarity in time and space for training set formation under concept drift. Intell. Data Anal. 15, 589–611 (2011)
Žliobaitė, I., Bifet, A., Pfahringer, B., Holmes, G.: Active learning with drifting streaming data. IEEE Trans. Neural Netw. Learn. Syst. 25, 27–39 (2014)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Castellani, A., Schmitt, S., Hammer, B. (2022). Stream-Based Active Learning with Verification Latency in Non-stationary Environments. In: Pimenidis, E., Angelov, P., Jayne, C., Papaleonidas, A., Aydin, M. (eds) Artificial Neural Networks and Machine Learning – ICANN 2022. ICANN 2022. Lecture Notes in Computer Science, vol 13532. Springer, Cham. https://doi.org/10.1007/978-3-031-15937-4_22
Download citation
DOI: https://doi.org/10.1007/978-3-031-15937-4_22
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-15936-7
Online ISBN: 978-3-031-15937-4
eBook Packages: Computer ScienceComputer Science (R0)