Probabilistic Active Learning in Datastreams

  • Daniel KottkeEmail author
  • Georg Krempl
  • Myra Spiliopoulou
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9385)


In recent years, stream-based active learning has become an intensively investigated research topic. In this work, we propose a new algorithm for stream-based active learning that decides immediately whether to acquire a label (selective sampling). To this purpose, we extend our pool-based Probabilistic Active Learning framework into a framework for streams. In particular, we complement the notion of usefulness within a topological space (“spatial usefulness”) with the concept of “temporal usefulness”. To actively select the instances, for which labels must be acquired, we introduce the Balanced Incremental Quantile Filter (BIQF), an algorithm that assesses the usefulness of instances in a sliding window, ensuring that the predefined budget restrictions will be met within a given tolerance window. We compare our approach to other active learning approaches for streams and show the competitiveness of our method.


Window Size Decision Boundary Uncertainty Sampling Probabilistic Gain Tolerance Window 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Arasu, A., Manku, G.S.: Approximate counts and quantiles over sliding windows. In: 23rd ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, pp. 286–296. ACM, New York (2004)Google Scholar
  2. 2.
    Asuncion, A., Newman, D.J.: UCI Machine Learning Repository (2007)Google Scholar
  3. 3.
    Chapelle, O.: Active learning for parzen window classifier. In: International Workshop on Artificial Intelligence and Statistics, pp. 49–56 (2005)Google Scholar
  4. 4.
    Cheng, Y., Chen, Z., Liu, L., Wang, J., Agrawal, A., Choudhary, A.: Feedback-driven multiclass active learning for data streams. In: Proceedings of the 22nd ACM International Conference on Information & Knowledge Management, CIKM 2013, San Francisco, California, USA, pp. 1311–1320. ACM, New York (2013). doi: 10.1145/2505515.2505528
  5. 5.
    Chu, W., Zinkevich, M., Li, L., Thomas, A., Tseng, B.: Unbiased online active learning in data streams. In: 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Diego, California, USA (2011)Google Scholar
  6. 6.
    Comer, D.: Ubiquitous b-tree. ACM Comput. Surv. 11(2), 121–137 (1979)MathSciNetCrossRefzbMATHGoogle Scholar
  7. 7.
    Freund, Y., Seung, H.S., Shamir, E., Tishby, N.: Selective sampling using the query by committee algorithm. Mach. Learn. 28(2–3), 133–168 (1997)CrossRefzbMATHGoogle Scholar
  8. 8.
    Halchenko, Y.O., Hanke, M.: Open is not enough. Let’s take the next step: an integrated, community-driven computing platform for neuroscience. Front. Neuroinf. 6, 22 (2012)CrossRefGoogle Scholar
  9. 9.
    Harries, M.B., Sammut, C., Horn, K.: Extracting hidden context. Mach. Learn. 32, 101–126 (1998)CrossRefzbMATHGoogle Scholar
  10. 10.
    Huang, S., Dong, Y.: An active learning system for mining time-changing data streams. Intell. Data Anal. 11, 401–419 (2007)Google Scholar
  11. 11.
    Ienco, D., Bifet, A., Žliobaitė, I., Pfahringer, B.: Clustering based active learning for evolving data streams. In: Fürnkranz, J., Hüllermeier, E., Higuchi, T. (eds.) DS 2013. LNCS, vol. 8140, pp. 79–93. Springer, Heidelberg (2013) CrossRefGoogle Scholar
  12. 12.
    Krempl, G., Ha, C.T., Spiliopoulou, M.: Clustering-based optimised probabilistic active learning (copal). In: 18th International Conference on Discovery Science (DS), Banff (2015)Google Scholar
  13. 13.
    Krempl, G., Kottke, D., Spiliopoulou, M.: Probabilistic active learning: towards combining versatility, optimality and efficiency. In: Džeroski, S., Panov, P., Kocev, D., Todorovski, L. (eds.) DS 2014. LNCS, vol. 8777, pp. 168–179. Springer, Heidelberg (2014) Google Scholar
  14. 14.
    Krempl, G., Zliobaite, I., Brzezinski, D., Hllermeier, E., Last, M., Lemaire, V., Noack, T., Shaker, A., Sievi, S., Spiliopoulou, M., Stefanowski, J.: Open challenges for data stream mining research. SIGKDD Explor. 16(1), 1–10 (2014)CrossRefGoogle Scholar
  15. 15.
    Lewis, D.D., Gale, W.A.: A sequential algorithm for training text classifiers. In: 17th Annual Intenational ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 1–10 (1994)Google Scholar
  16. 16.
    Lindstrom, P., Delany, S.J., Namee, B.M.: Handling concept drift in a text data stream constrained by high labelling cost. In: FLAIRS Conference (2010)Google Scholar
  17. 17.
    Ng, A.Y., Jordan, M.I.: On discriminative vs. generative classifiers: a comparison of logistic regression and naive bayes. In: Advances in Neural Information Processing Systems 14, pp. 841–848. MIT Press (2002)Google Scholar
  18. 18.
    Roy, N., McCallum, A.: Toward optimal active learning through sampling estimation of error reduction. In: International Conference on Machine Learning, ICML 2001, pp. 441–448. Morgan Kaufmann Publishers Inc., San Francisco (2001)Google Scholar
  19. 19.
    Ryu, J.W., Kantardzic, M.M., Kim, M.-W., Ra Khil, A.: An efficient method of building an ensemble of classifiers in streaming data. In: Srinivasa, S., Bhatnagar, V. (eds.) BDA 2012. LNCS, vol. 7678, pp. 122–133. Springer, Heidelberg (2012) CrossRefGoogle Scholar
  20. 20.
    Settles, B.: Active Learning Literature Survey. University of Wisconsin, Madison (2010) Google Scholar
  21. 21.
    Settles, B.: Active Learning. Synthesis Lectures on Artificial Intelligence and Machine Learning, vol. 6, no. 1, pp. 1–114 (2012)Google Scholar
  22. 22.
    Tomanek, K., Olsson, F.: A web survey on the use of active learning to support annotation of text data. In: NAACL HLT Workshop on Active Learning for Natural Language Processing, Stroudsburg, PA, USA, pp. 45–48 (2009)Google Scholar
  23. 23.
    Wang, L., Luo, G., Yi, K., Cormode, G.: Quantiles over data streams: an experimental study. In: Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, SIGMOD 2013, pp. 737–748. ACM, New York (2013)Google Scholar
  24. 24.
    Wang, P., Zhang, P., Guo, L.: Mining multi-label data streams using ensemble-based active learning. In: SIAM Conference on Data Mining, pp. 1131–1140 (2012)Google Scholar
  25. 25.
    Zhu, X., Zhang, P., Lin, X., Shi, Y.: Active learning from stream data using optimal weight classifier ensemble. IEEE Trans. Syst. Man Cybern. Part B Cybern. 40(6), 1607–1621 (2010)CrossRefGoogle Scholar
  26. 26.
    Zliobaite, I., Bifet, A., Pfahringer, B., Holmes, G.: Active learning with drifting streaming data. IEEE Trans. Neural Netw. Learn. Syst. 25(1), 27–39 (2014)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Daniel Kottke
    • 1
    Email author
  • Georg Krempl
    • 1
  • Myra Spiliopoulou
    • 1
  1. 1.Knowledge Management and Discovery LabOtto-von-Guericke-UniversityMagdeburgGermany

Personalised recommendations