Probabilistic Active Learning in Datastreams

Kottke, Daniel; Krempl, Georg; Spiliopoulou, Myra

doi:10.1007/978-3-319-24465-5_13

Daniel Kottke¹⁶,
Georg Krempl¹⁶ &
Myra Spiliopoulou¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9385))

Included in the following conference series:

International Symposium on Intelligent Data Analysis

1611 Accesses
8 Citations

Abstract

In recent years, stream-based active learning has become an intensively investigated research topic. In this work, we propose a new algorithm for stream-based active learning that decides immediately whether to acquire a label (selective sampling). To this purpose, we extend our pool-based Probabilistic Active Learning framework into a framework for streams. In particular, we complement the notion of usefulness within a topological space (“spatial usefulness”) with the concept of “temporal usefulness”. To actively select the instances, for which labels must be acquired, we introduce the Balanced Incremental Quantile Filter (BIQF), an algorithm that assesses the usefulness of instances in a sliding window, ensuring that the predefined budget restrictions will be met within a given tolerance window. We compare our approach to other active learning approaches for streams and show the competitiveness of our method.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Companion website: http://kmd.cs.ovgu.de/res/pals.
2.
More learning curves are available on http://kmd.cs.ovgu.de/res/pals.

References

Arasu, A., Manku, G.S.: Approximate counts and quantiles over sliding windows. In: 23rd ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, pp. 286–296. ACM, New York (2004)
Google Scholar
Asuncion, A., Newman, D.J.: UCI Machine Learning Repository (2007)
Google Scholar
Chapelle, O.: Active learning for parzen window classifier. In: International Workshop on Artificial Intelligence and Statistics, pp. 49–56 (2005)
Google Scholar
Cheng, Y., Chen, Z., Liu, L., Wang, J., Agrawal, A., Choudhary, A.: Feedback-driven multiclass active learning for data streams. In: Proceedings of the 22nd ACM International Conference on Information & Knowledge Management, CIKM 2013, San Francisco, California, USA, pp. 1311–1320. ACM, New York (2013). doi:10.1145/2505515.2505528
Chu, W., Zinkevich, M., Li, L., Thomas, A., Tseng, B.: Unbiased online active learning in data streams. In: 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Diego, California, USA (2011)
Google Scholar
Comer, D.: Ubiquitous b-tree. ACM Comput. Surv. 11(2), 121–137 (1979)
Article MathSciNet MATH Google Scholar
Freund, Y., Seung, H.S., Shamir, E., Tishby, N.: Selective sampling using the query by committee algorithm. Mach. Learn. 28(2–3), 133–168 (1997)
Article MATH Google Scholar
Halchenko, Y.O., Hanke, M.: Open is not enough. Let’s take the next step: an integrated, community-driven computing platform for neuroscience. Front. Neuroinf. 6, 22 (2012)
Article Google Scholar
Harries, M.B., Sammut, C., Horn, K.: Extracting hidden context. Mach. Learn. 32, 101–126 (1998)
Article MATH Google Scholar
Huang, S., Dong, Y.: An active learning system for mining time-changing data streams. Intell. Data Anal. 11, 401–419 (2007)
Google Scholar
Ienco, D., Bifet, A., Žliobaitė, I., Pfahringer, B.: Clustering based active learning for evolving data streams. In: Fürnkranz, J., Hüllermeier, E., Higuchi, T. (eds.) DS 2013. LNCS, vol. 8140, pp. 79–93. Springer, Heidelberg (2013)
Chapter Google Scholar
Krempl, G., Ha, C.T., Spiliopoulou, M.: Clustering-based optimised probabilistic active learning (copal). In: 18th International Conference on Discovery Science (DS), Banff (2015)
Google Scholar
Krempl, G., Kottke, D., Spiliopoulou, M.: Probabilistic active learning: towards combining versatility, optimality and efficiency. In: Džeroski, S., Panov, P., Kocev, D., Todorovski, L. (eds.) DS 2014. LNCS, vol. 8777, pp. 168–179. Springer, Heidelberg (2014)
Google Scholar
Krempl, G., Zliobaite, I., Brzezinski, D., Hllermeier, E., Last, M., Lemaire, V., Noack, T., Shaker, A., Sievi, S., Spiliopoulou, M., Stefanowski, J.: Open challenges for data stream mining research. SIGKDD Explor. 16(1), 1–10 (2014)
Article Google Scholar
Lewis, D.D., Gale, W.A.: A sequential algorithm for training text classifiers. In: 17th Annual Intenational ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 1–10 (1994)
Google Scholar
Lindstrom, P., Delany, S.J., Namee, B.M.: Handling concept drift in a text data stream constrained by high labelling cost. In: FLAIRS Conference (2010)
Google Scholar
Ng, A.Y., Jordan, M.I.: On discriminative vs. generative classifiers: a comparison of logistic regression and naive bayes. In: Advances in Neural Information Processing Systems 14, pp. 841–848. MIT Press (2002)
Google Scholar
Roy, N., McCallum, A.: Toward optimal active learning through sampling estimation of error reduction. In: International Conference on Machine Learning, ICML 2001, pp. 441–448. Morgan Kaufmann Publishers Inc., San Francisco (2001)
Google Scholar
Ryu, J.W., Kantardzic, M.M., Kim, M.-W., Ra Khil, A.: An efficient method of building an ensemble of classifiers in streaming data. In: Srinivasa, S., Bhatnagar, V. (eds.) BDA 2012. LNCS, vol. 7678, pp. 122–133. Springer, Heidelberg (2012)
Chapter Google Scholar
Settles, B.: Active Learning Literature Survey. University of Wisconsin, Madison (2010)
Google Scholar
Settles, B.: Active Learning. Synthesis Lectures on Artificial Intelligence and Machine Learning, vol. 6, no. 1, pp. 1–114 (2012)
Google Scholar
Tomanek, K., Olsson, F.: A web survey on the use of active learning to support annotation of text data. In: NAACL HLT Workshop on Active Learning for Natural Language Processing, Stroudsburg, PA, USA, pp. 45–48 (2009)
Google Scholar
Wang, L., Luo, G., Yi, K., Cormode, G.: Quantiles over data streams: an experimental study. In: Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, SIGMOD 2013, pp. 737–748. ACM, New York (2013)
Google Scholar
Wang, P., Zhang, P., Guo, L.: Mining multi-label data streams using ensemble-based active learning. In: SIAM Conference on Data Mining, pp. 1131–1140 (2012)
Google Scholar
Zhu, X., Zhang, P., Lin, X., Shi, Y.: Active learning from stream data using optimal weight classifier ensemble. IEEE Trans. Syst. Man Cybern. Part B Cybern. 40(6), 1607–1621 (2010)
Article Google Scholar
Zliobaite, I., Bifet, A., Pfahringer, B., Holmes, G.: Active learning with drifting streaming data. IEEE Trans. Neural Netw. Learn. Syst. 25(1), 27–39 (2014)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Knowledge Management and Discovery Lab, Otto-von-Guericke-University, Universitätsplatz 2, 39106, Magdeburg, Germany
Daniel Kottke, Georg Krempl & Myra Spiliopoulou

Authors

Daniel Kottke
View author publications
You can also search for this author in PubMed Google Scholar
Georg Krempl
View author publications
You can also search for this author in PubMed Google Scholar
Myra Spiliopoulou
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Daniel Kottke .

Editor information

Editors and Affiliations

Université de Saint-Etienne, Saint-Etienne, France
Elisa Fromont
Intelligent Systems Lab, University of Bristol Intelligent Systems Lab, Bristol, United Kingdom
Tijl De Bie
Informatics Section, Katholieke Universiteit Leuven, Leuven, Belgium
Matthijs van Leeuwen

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kottke, D., Krempl, G., Spiliopoulou, M. (2015). Probabilistic Active Learning in Datastreams. In: Fromont, E., De Bie, T., van Leeuwen, M. (eds) Advances in Intelligent Data Analysis XIV. IDA 2015. Lecture Notes in Computer Science(), vol 9385. Springer, Cham. https://doi.org/10.1007/978-3-319-24465-5_13

Download citation

DOI: https://doi.org/10.1007/978-3-319-24465-5_13
Published: 22 November 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-24464-8
Online ISBN: 978-3-319-24465-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics