Active Learning with Evolving Streaming Data

  • Indrė Žliobaitė
  • Albert Bifet
  • Bernhard Pfahringer
  • Geoff Holmes
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6913)

Abstract

In learning to classify streaming data, obtaining the true labels may require major effort and may incur excessive cost. Active learning focuses on learning an accurate model with as few labels as possible. Streaming data poses additional challenges for active learning, since the data distribution may change over time (concept drift) and classifiers need to adapt. Conventional active learning strategies concentrate on querying the most uncertain instances, which are typically concentrated around the decision boundary. If changes do not occur close to the boundary, they will be missed and classifiers will fail to adapt. In this paper we develop two active learning strategies for streaming data that explicitly handle concept drift. They are based on uncertainty, dynamic allocation of labeling efforts over time and randomization of the search space. We empirically demonstrate that these strategies react well to changes that can occur anywhere in the instance space and unexpectedly.

Keywords

Active Learning Data Stream Decision Boundary Concept Drift Streaming Data 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. 1.
    Asuncion, A., Newman, D.: UCI Machine Learning Repository. University of California, Irvine, School of Information and Computer Sciences (2007)Google Scholar
  2. 2.
    Attenberg, J., Provost, F.: Active inference and learning for classifying streams. In: ICML 2010 Workshop on Budgeted Learning (2010)Google Scholar
  3. 3.
    Bifet, A., Holmes, G., Kirkby, R., Pfahringer, B.: MOA: Massive Online Analysis. Journal of Machine Learning Research 11, 1601–1604 (2010)Google Scholar
  4. 4.
    Borchani, H., Larrañaga, P., Bielza, C.: Mining concept-drifting data streams containing labeled and unlabeled instances. In: García-Pedrajas, N., Herrera, F., Fyfe, C., Benítez, J.M., Ali, M. (eds.) IEA/AIE 2010. LNCS, vol. 6096, pp. 531–540. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  5. 5.
    Cesa-Bianchi, N., Gentile, C., Zaniboni, L.: Worst-case analysis of selective sampling for linear classification. J. Mach. Learn. Res. 7, 1205–1230 (2006)MATHMathSciNetGoogle Scholar
  6. 6.
    Cohn, D., Atlas, l., Ladner, R.: Improving generalization with active learning. Machine Learning 15, 201–221 (1994)Google Scholar
  7. 7.
    Fan, W., Huang, Y., Wang, H., Yu, P.: Active mining of data streams. In: SDM 2004, pp. 457–461 (2004)Google Scholar
  8. 8.
    Gama, J., Medas, P., Castillo, G., Rodrigues, P.: Learning with drift detection. In: Bazzan, A.L.C., Labidi, S. (eds.) SBIA 2004. LNCS (LNAI), vol. 3171, pp. 286–295. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  9. 9.
    Harries, M., Sammut, C., Horn, K.: Extracting hidden context. Machine Learning 32(2), 101–126 (1998)CrossRefMATHGoogle Scholar
  10. 10.
    Helmbold, D., Panizza, S.: Some label efficient learning results. In: COLT 1997, pp. 218–230 (1997)Google Scholar
  11. 11.
    Huang, S., Dong, Y.: An active learning system for mining time-changing data streams. Intelligent Data Analysis 11, 401–419 (2007)Google Scholar
  12. 12.
    Ikonomovska, E., Gama, J., Dzeroski, S.: Learning model trees from evolving data streams. Data Mining and Knowledge Discovery 23(1), 128–168 (2010)CrossRefMATHMathSciNetGoogle Scholar
  13. 13.
    Klinkenberg, R.: Using labeled and unlabeled data to learn drifting concepts. In: IJCAI Workshop on Learning from Temporal and Spatial Data, pp. 16–24 (2001)Google Scholar
  14. 14.
    Lewis, D., Gale, W.: A sequential algorithm for training text classifiers. In: ACM SIGIR, pp. 3–12 (1994)Google Scholar
  15. 15.
    Lewis, D., Yang, Y., Rose, T., Li, F.: Rcv1: A new benchmark collection for text categorization research. J. Mach. Learn. Res. 5, 361–397 (2004)Google Scholar
  16. 16.
    Lindstrom, P., Delany, S.J., MacNamee, B.: Handling concept drift in a text data stream constrained by high labelling cost. In: FLAIRS. AAAI Press, Menlo Park (2010)Google Scholar
  17. 17.
    Masud, M., Gao, J., Khan, L., Han, J., Thuraisingham, B.: Classification and novel class detection in data streams with active mining. In: Zaki, M.J., Yu, J.X., Ravindran, B., Pudi, V. (eds.) PAKDD 2010. LNCS, vol. 6119, pp. 311–324. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  18. 18.
    Mccallum, A., Nigam, K.: A comparison of event models for naive bayes text classification. In: AAAI Workshop on Learning for Text Categorization (1998)Google Scholar
  19. 19.
    Sculley, D.: Online active learning methods for fast label-efficient spam filtering. In: CEAS 2007 (2007)Google Scholar
  20. 20.
    Settles, B.: Active learning literature survey. Computer Sciences Technical Report 1648, University of Wisconsin–Madison (2009)Google Scholar
  21. 21.
    Widyantoro, D., Yen, J.: Relevant data expansion for learning concept drift from sparsely labeled data. IEEE Tr. on Know. and Data Eng. 17, 401–412 (2005)CrossRefGoogle Scholar
  22. 22.
    Woolam, C., Masud, M., Khan, L.: Lacking labels in the stream: Classifying evolving stream data with few labels. In: Rauch, J., Raś, Z.W., Berka, P., Elomaa, T. (eds.) ISMIS 2009. LNCS, vol. 5722, pp. 552–562. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  23. 23.
    Zhang, P., Zhu, X., Guo, L.: Mining data streams with labeled and unlabeled training examples. In: ICDM 2009, pp. 627–636 (2009)Google Scholar
  24. 24.
    Zhu, X., Zhang, P., Lin, X., Shi, Y.: Active learning from data streams. In: ICDM 2007, pp. 757–762 (2007)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Indrė Žliobaitė
    • 1
    • 2
  • Albert Bifet
    • 2
  • Bernhard Pfahringer
    • 2
  • Geoff Holmes
    • 2
  1. 1.Bournemouth UniversityPooleUK
  2. 2.University of WaikatoHamiltonNew Zealand

Personalised recommendations