Advertisement

Improving Adaptive Bagging Methods for Evolving Data Streams

  • Albert Bifet
  • Geoff Holmes
  • Bernhard Pfahringer
  • Ricard Gavaldà
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5828)

Abstract

We propose two new improvements for bagging methods on evolving data streams. Recently, two new variants of Bagging were proposed: ADWIN Bagging and Adaptive-Size Hoeffding Tree (ASHT) Bagging. ASHT Bagging uses trees of different sizes, and ADWIN Bagging uses ADWIN as a change detector to decide when to discard underperforming ensemble members. We improve ADWIN Bagging using Hoeffding Adaptive Trees, trees that can adaptively learn from data streams that change over time. To speed up the time for adapting to change of Adaptive-Size Hoeffding Tree (ASHT) Bagging, we add an error change detector for each classifier. We test our improvements by performing an evaluation study on synthetic and real-world datasets comprising up to ten million examples.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Asuncion, A., Newman, D.: UCI machine learning repository (2007)Google Scholar
  2. 2.
    Baena-García, M., del Campo-Ávila, J., Fidalgo, R., Bifet, A., Gavaldà, R., Morales-Bueno, R.: Early drift detection method. In: Fourth International Workshop on Knowledge Discovery from Data Streams (2006)Google Scholar
  3. 3.
    Basseville, M., Nikiforov, I.V.: Detection of abrupt changes: theory and application. Prentice-Hall, Inc., Upper Saddle River (1993)Google Scholar
  4. 4.
    Bifet, A., Gavaldà, R.: Learning from time-changing data with adaptive windowing. In: SIAM International Conference on Data Mining, pp. 443–448 (2007)Google Scholar
  5. 5.
    Bifet, A., Gavaldà, R.: Adaptive learning from evolving data streams. In: IDA (2009)Google Scholar
  6. 6.
    Bifet, A., Holmes, G., Pfahringer, B., Kirkby, R., Gavaldà, R.: New ensemble methods for evolving data streams. In: KDD 2009. ACM, New York (2009)Google Scholar
  7. 7.
    Breiman, L., et al.: Classification and Regression Trees. Chapman & Hall, New York (1984)zbMATHGoogle Scholar
  8. 8.
    Chu, F., Zaniolo, C.: Fast and light boosting for adaptive mining of data streams. In: Dai, H., Srikant, R., Zhang, C. (eds.) PAKDD 2004. LNCS (LNAI), vol. 3056, pp. 282–292. Springer, Heidelberg (2004)Google Scholar
  9. 9.
    del Campo-Ávila, J., Ramos-Jiménez, G., Gama, J., Bueno, R.M.: Improving the performance of an incremental algorithm driven by error margins. Intell. Data Anal. 12(3), 305–318 (2008)Google Scholar
  10. 10.
    Domingos, P., Hulten, G.: Mining high-speed data streams. In: Knowledge Discovery and Data Mining, pp. 71–80 (2000)Google Scholar
  11. 11.
    Gama, J., Medas, P., Castillo, G., Rodrigues, P.: Learning with drift detection. In: SBIA Brazilian Symposium on Artificial Intelligence, pp. 286–295 (2004)Google Scholar
  12. 12.
    Gama, J., Medas, P., Rocha, R.: Forest trees for on-line data. In: SAC 2004: Proceedings of the 2004 ACM symposium on Applied computing, pp. 632–636. ACM Press, New York (2004)CrossRefGoogle Scholar
  13. 13.
    Gama, J., Rocha, R., Medas, P.: Accurate decision trees for mining high-speed data streams. In: KDD 2003, August 2003, pp. 523–528 (2003)Google Scholar
  14. 14.
    Gustafsson, F.: Adaptive Filtering and Change Detection. Wiley, Chichester (2000)CrossRefGoogle Scholar
  15. 15.
    Harries, M.: Splice-2 comparative evaluation: Electricity pricing. Technical report, The University of South Wales (1999)Google Scholar
  16. 16.
    Holmes, G., Kirkby, R., Pfahringer, B.: MOA: Massive Online Analysis (2007), http://sourceforge.net/projects/moa-datastream
  17. 17.
    Hulten, G., Spencer, L., Domingos, P.: Mining time-changing data streams. In: KDD 2001, San Francisco, CA, pp. 97–106. ACM Press, New York (2001)CrossRefGoogle Scholar
  18. 18.
    Margineantu, D.D., Dietterich, T.G.: Pruning adaptive boosting. In: ICML 1997, pp. 211–218 (1997)Google Scholar
  19. 19.
    Mitchell, T.: Machine Learning. McGraw-Hill Education (ISE Editions), New York (1997)zbMATHGoogle Scholar
  20. 20.
    Oza, N., Russell, S.: Online bagging and boosting. In: Artificial Intelligence and Statistics 2001, pp. 105–112. Morgan Kaufmann, San Francisco (2001)Google Scholar
  21. 21.
    Oza, N.C., Russell, S.: Experimental comparisons of online and batch versions of bagging and boosting. In: KDD 2001, August 2001, pp. 359–364 (2001)Google Scholar
  22. 22.
    Pelossof, R., Jones, M., Vovsha, I., Rudin, C.: Online coordinate boosting (2008), http://arxiv.org/abs/0810.4553
  23. 23.
    Street, W.N., Kim, Y.: A streaming ensemble algorithm (SEA) for large-scale classification. In: KDD 2001, pp. 377–382. ACM Press, New York (2001)CrossRefGoogle Scholar
  24. 24.
    Zhang, P., Zhu, X., Shi, Y.: Categorizing and mining concept drifting data streams. In: KDD 2008, pp. 812–820. ACM, New York (2008)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Albert Bifet
    • 1
  • Geoff Holmes
    • 1
  • Bernhard Pfahringer
    • 1
  • Ricard Gavaldà
    • 2
  1. 1.University of WaikatoHamiltonNew Zealand
  2. 2.Universitat Politècnica de CatalunyaBarcelonaSpain

Personalised recommendations