Batch-Incremental versus Instance-Incremental Learning in Dynamic and Evolving Data

  • Jesse Read
  • Albert Bifet
  • Bernhard Pfahringer
  • Geoff Holmes
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7619)

Abstract

Many real world problems involve the challenging context of data streams, where classifiers must be incremental: able to learn from a theoretically-infinite stream of examples using limited time and memory, while being able to predict at any point. Two approaches dominate the literature: batch-incremental methods that gather examples in batches to train models; and instance-incremental methods that learn from each example as it arrives. Typically, papers in the literature choose one of these approaches, but provide insufficient evidence or references to justify their choice. We provide a first in-depth analysis comparing both approaches, including how they adapt to concept drift, and an extensive empirical study to compare several different versions of each approach. Our results reveal the respective advantages and disadvantages of the methods, which we discuss in detail.

Keywords

data streams incremental dynamic evolving on-line 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Bifet, A., Holmes, G., Pfahringer, B., Kirkby, R., Gavaldà, R.: New ensemble methods for evolving data streams. In: KDD, pp. 139–148 (2009)Google Scholar
  2. 2.
    Beringer, J., Hüllermeier, E.: Efficient instance-based learning on data streams. Intelligent Data Analysis 11(6), 627–650 (2007)Google Scholar
  3. 3.
    Zhang, P., Gao, B.J., Zhu, X., Guo, L.: Enabling fast lazy learning for data streams. In: ICDM, pp. 932–941 (2011)Google Scholar
  4. 4.
    John, G.H., Langley, P.: Estimating continuous distributions in bayesian classifiers. In: Eleventh Conference on Uncertainty in Artificial Intelligence, San Mateo, pp. 338–345. Morgan Kaufmann (1995)Google Scholar
  5. 5.
    Domingos, P., Hulten, G.: Mining high-speed data streams. In: KDD, pp. 71–80 (2000)Google Scholar
  6. 6.
    Bifet, A., Gavaldà, R.: Learning from time-changing data with adaptive windowing. In: SDM (2007)Google Scholar
  7. 7.
    Holte, R.C.: Very simple classification rules perform well on most commonly used datasets. Machine Learning 11, 63–91 (1993)MATHCrossRefGoogle Scholar
  8. 8.
    Caruana, R., Niculescu-Mizil, A.: An empirical comparison of supervised learning algorithms. In: ICML, pp. 161–168 (2006)Google Scholar
  9. 9.
    Bottou, L.: Online algorithms and stochastic approximations. Online Learning and Neural Networks (1998)Google Scholar
  10. 10.
    Oza, N.C., Russell, S.J.: Experimental comparisons of online and batch versions of bagging and boosting. In: KDD, pp. 359–364 (2001)Google Scholar
  11. 11.
    Oza, N., Russell, S.: Online bagging and boosting. In: Artificial Intelligence and Statistics 2001, pp. 105–112. Morgan Kaufmann (2001)Google Scholar
  12. 12.
    Bifet, A., Gavaldà, R.: Adaptive Learning from Evolving Data Streams. In: Adams, N.M., Robardet, C., Siebes, A., Boulicaut, J.-F. (eds.) IDA 2009. LNCS, vol. 5772, pp. 249–260. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  13. 13.
    Bifet, A., Holmes, G., Pfahringer, B.: Leveraging Bagging for Evolving Data Streams. In: Balcázar, J.L., Bonchi, F., Gionis, A., Sebag, M. (eds.) ECML PKDD 2010, Part I. LNCS, vol. 6321, pp. 135–150. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  14. 14.
    Qu, W., Zhang, Y., Zhu, J., Qiu, Q.: Mining Multi-label Concept-Drifting Data Streams Using Dynamic Classifier Ensemble. In: Zhou, Z.-H., Washio, T. (eds.) ACML 2009. LNCS, vol. 5828, pp. 308–321. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  15. 15.
    Wang, H., Fan, W., Yu, P.S., Han, J.: Mining concept-drifting data streams using ensemble classifiers. In: KDD 2003, pp. 226–235. ACM, New York (2003)CrossRefGoogle Scholar
  16. 16.
    Spyromitros-Xioufis, E., Spiliopoulou, M., Tsoumakas, G., Vlahavas, I.: Dealing with concept drift and class imbalance in multi-label stream classification. In: IJCAI, pp. 1583–1588 (2011)Google Scholar
  17. 17.
    Bifet, A., Holmes, G., Kirkby, R., Pfahringer, B.: MOA: Massive Online Analysis. Journal of Machine Learning Research, JMLR (2010)Google Scholar
  18. 18.
    Street, W.N., Kim, Y.: A streaming ensemble algorithm (SEA) for large-scale classification. In: KDD, pp. 377–382 (2001)Google Scholar
  19. 19.
    Hulten, G., Spencer, L., Domingos, P.: Mining time-changing data streams. In: KDD, pp. 97–106 (2001)Google Scholar
  20. 20.
    Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J.: Classification and Regression Trees. Wadsworth (1984)Google Scholar
  21. 21.
    Asuncion, A., Newman, D.: UCI machine learning repository (2007)Google Scholar
  22. 22.
    Gama, J., Rocha, R., Medas, P.: Accurate decision trees for mining high-speed data streams. In: KDD, pp. 523–528 (2003)Google Scholar
  23. 23.
    Harries, M.: Splice-2 comparative evaluation: Electricity pricing. Technical report, The University of South Wales (1999)Google Scholar
  24. 24.
    Gama, J., Medas, P., Castillo, G., Rodrigues, P.: Learning with Drift Detection. In: Bazzan, A.L.C., Labidi, S. (eds.) SBIA 2004. LNCS (LNAI), vol. 3171, pp. 286–295. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  25. 25.
    Lang, K.: The 20 newsgroups dataset (2008), http://people.csail.mit.edu/jrennie/20Newsgroups/
  26. 26.
    Read, J., Bifet, A., Holmes, G., Pfahringer, B.: Scalable and efficient multi-label classification for evolving data streams. Machine Learning, 1–30 (2012)Google Scholar
  27. 27.
    Demšar, J.: Statistical comparisons of classifiers over multiple data sets. The Journal of Machine Learning Research 7, 1–30 (2006)MATHGoogle Scholar
  28. 28.
    Bifet, A., Holmes, G., Pfahringer, B., Frank, E.: Fast Perceptron Decision Tree Learning from Evolving Data Streams. In: Zaki, M.J., Yu, J.X., Ravindran, B., Pudi, V. (eds.) PAKDD 2010. LNCS, vol. 6119, pp. 299–310. Springer, Heidelberg (2010)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Jesse Read
    • 2
  • Albert Bifet
    • 1
  • Bernhard Pfahringer
    • 1
  • Geoff Holmes
    • 1
  1. 1.University of WaikatoHamiltonNew Zealand
  2. 2.Universidad Carlos IIIMadridSpain

Personalised recommendations