Skip to main content

Clustering Based Active Learning for Evolving Data Streams

  • Conference paper
Discovery Science (DS 2013)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8140))

Included in the following conference series:

Abstract

Data labeling is an expensive and time-consuming task. Choosing which labels to use is increasingly becoming important. In the active learning setting, a classifier is trained by asking for labels for only a small fraction of all instances. While many works exist that deal with this issue in non-streaming scenarios, few works exist in the data stream setting. In this paper we propose a new active learning approach for evolving data streams based on a pre-clustering step, for selecting the most informative instances for labeling. We consider a batch incremental setting: when a new batch arrives, first we cluster the examples, and then, we select the best instances to train the learner. The clustering approach allows to cover the whole data space avoiding to oversample examples from only few areas. We compare our method w.r.t. state of the art active learning strategies over real datasets. The results highlight the improvement in performance of our proposal. Experiments on parameter sensitivity are also reported.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Asuncion, A., Newman, D.: UCI machine learning repository, University of California, Irvine, School of Information and Computer Sciences (2007)

    Google Scholar 

  2. Attenberg, J., Provost, F.: Online active inference and learning. In: Proc. of the 17th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, KDD, pp. 186–194 (2011)

    Google Scholar 

  3. Bifet, A., Holmes, G., Kirkby, R., Pfahringer, B.: MOA: Massive Online Analysis. J. Mach. Learn. Res. 11, 1601–1604 (2010)

    Google Scholar 

  4. Cesa-Bianchi, N., Gentile, C., Zaniboni, L.: Worst-case analysis of selective sampling for linear classification. J. Mach. Learn. Res. 7, 1205–1230 (2006)

    MathSciNet  MATH  Google Scholar 

  5. Cohn, D., Atlas, l., Ladner, R.: Improving generalization with active learning. Machine Learning 15, 201–221 (1994)

    Google Scholar 

  6. Fan, W., Huang, Y., Wang, H., Yu, P.: Active mining of data streams. In: Proc. of the 4th SIAM Int. Conf. on Data Mining, SDM, pp. 457–461 (2004)

    Google Scholar 

  7. Gama, J., Medas, P., Castillo, G., Rodrigues, P.: Learning with drift detection. In: Bazzan, A.L.C., Labidi, S. (eds.) SBIA 2004. LNCS (LNAI), vol. 3171, pp. 286–295. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  8. Harries, M., Sammut, C., Horn, K.: Extracting hidden context. Machine Learning 32(2), 101–126 (1998)

    Article  MATH  Google Scholar 

  9. Helmbold, D., Panizza, S.: Some label efficient learning results. In: Proc. of the 10th An. Conf. on Computational Learning Theory, COLT, pp. 218–230 (1997)

    Google Scholar 

  10. Huang, S., Dong, Y.: An active learning system for mining time-changing data streams. Intelligent Data Analysis 11, 401–419 (2007)

    Google Scholar 

  11. Ikonomovska, E., Gama, J., Dzeroski, S.: Learning model trees from evolving data streams. Data Mining and Knowledge Discovery 23(1), 128–168 (2010)

    Article  MathSciNet  Google Scholar 

  12. Klinkenberg, R.: Using labeled and unlabeled data to learn drifting concepts. In: IJCAI Workshop on Learning from Temporal and Spatial Data, pp. 16–24 (2001)

    Google Scholar 

  13. Lewis, D., Gale, W.: A sequential algorithm for training text classifiers. In: Proc. of the 17th An. Int. ACM SIGIR Conf. on Research and Development in Information Retrieval, SIGIR, pp. 3–12 (1994)

    Google Scholar 

  14. Lindstrom, P., Delany, S.J., MacNamee, B.: Handling concept drift in a text data stream constrained by high labelling cost. In: Proc. of the 23rd Int. Florida Artificial Intelligence Research Society Conference, FLAIRS (2010)

    Google Scholar 

  15. Manyika, J., Chui, M., Brown, B., Bughin, J., Dobbs, R., Roxburgh, C., Byers, A.: Big data: The next frontier for innovation, competition, and productivity (2011)

    Google Scholar 

  16. Masud, M.M., Gao, J., Khan, L., Han, J., Thuraisingham, B.: Classification and novel class detection in data streams with active mining. In: Zaki, M.J., Yu, J.X., Ravindran, B., Pudi, V. (eds.) PAKDD 2010, Part III. LNCS, vol. 6119, pp. 311–324. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  17. Masud, M., Woolam, C., Gao, J., Khan, L., Han, J., Hamlen, K., Oza, N.: Facing the reality of data stream classification: coping with scarcity of labeled data. Knowl. Inf. Syst. 33(1), 213–244 (2011)

    Article  Google Scholar 

  18. Nguyen, H., Smeulders, A.: Active learning using pre-clustering. In: Proc. of the 21st Int. Conf. on Machine Learning, ICML, pp. 623–630 (2004)

    Google Scholar 

  19. Quinlan, R.J.: C4.5: Programs for Machine Learning. Kaufmann Series in Machine Learning. Morgan Kaufmann. Morgan Kaufmann (1993)

    Google Scholar 

  20. Sculley, D.: Online active learning methods for fast label-efficient spam filtering. In: Proc. of the 4th Conf. on Email and Anti-Spam, CEAS (2007)

    Google Scholar 

  21. Tan, P.N., Steinbach, M., Kumar, V.: Introduction to Data Mining. Addison-Wesley Longman Publishing (2005)

    Google Scholar 

  22. Widyantoro, D., Yen, J.: Relevant data expansion for learning concept drift from sparsely labeled data. IEEE Tr. on Know. and Data Eng. 17, 401–412 (2005)

    Article  Google Scholar 

  23. Zhu, X., Zhang, P., Lin, X., Shi, Y.: Active learning from data streams. In: Proc. of the 7th IEEE Int. Conf. on Data Mining, ICDM, pp. 757–762 (2007)

    Google Scholar 

  24. Zliobaite, I., Bifet, A., Pfahringer, B., Holmes, G.: Active learning with drifting streaming data. IEEE Trans. on Neural Networks and Learning Systems (page in press, 2013)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Ienco, D., Bifet, A., Žliobaitė, I., Pfahringer, B. (2013). Clustering Based Active Learning for Evolving Data Streams. In: Fürnkranz, J., Hüllermeier, E., Higuchi, T. (eds) Discovery Science. DS 2013. Lecture Notes in Computer Science(), vol 8140. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40897-7_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-40897-7_6

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-40896-0

  • Online ISBN: 978-3-642-40897-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics