Advertisement

Leveraging Bagging for Evolving Data Streams

  • Albert Bifet
  • Geoff Holmes
  • Bernhard Pfahringer
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6321)

Abstract

Bagging, boosting and Random Forests are classical ensemble methods used to improve the performance of single classifiers. They obtain superior performance by increasing the accuracy and diversity of the single classifiers. Attempts have been made to reproduce these methods in the more challenging context of evolving data streams. In this paper, we propose a new variant of bagging, called leveraging bagging. This method combines the simplicity of bagging with adding more randomization to the input, and output of the classifiers. We test our method by performing an evaluation study on synthetic and real-world datasets comprising up to ten million examples.

References

  1. 1.
    Abdulsalam, H., Skillicorn, D.B., Martin, P.: Streaming random forests. In: IDEAS 2007: Proceedings of the 11th International Database Engineering and Applications Symposium, Washington, DC, USA, pp. 225–232. IEEE Computer Society, Los Alamitos (2007)Google Scholar
  2. 2.
    Asuncion, A., Newman, D.: UCI machine learning repository (2007)Google Scholar
  3. 3.
    Bifet, A., Gavaldà, R.: Learning from time-changing data with adaptive windowing. In: Jonker, W., Petković, M. (eds.) SDM 2007. LNCS, vol. 4721. Springer, Heidelberg (2007)Google Scholar
  4. 4.
    Bifet, A., Holmes, G., Kirkby, R., Pfahringer, B.: MOA: Massive Online Analysis. Journal of Machine Learning Research, JMLR (2010), http://moa.cs.waikato.ac.nz/
  5. 5.
    Bifet, A., Holmes, G., Pfahringer, B., Frank, E.: Fast perceptron decision tree learning from evolving data streams. In: Zaki, M.J., Yu, J.X., Ravindran, B., Pudi, V. (eds.) Advances in Knowledge Discovery and Data Mining. LNCS, vol. 6119, pp. 299–310. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  6. 6.
    Bifet, A., Holmes, G., Pfahringer, B., Kirkby, R., Gavaldà, R.: New ensemble methods for evolving data streams. In: KDD, pp. 139–148 (2009)Google Scholar
  7. 7.
    Breiman, L.: Bagging predictors. Machine Learning 24(2), 123–140 (1996)MATHMathSciNetGoogle Scholar
  8. 8.
    Breiman, L.: Random forests. Machine Learning 45(1), 5–32 (2001)MATHCrossRefGoogle Scholar
  9. 9.
    Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J.: Classification and Regression Trees. Wadsworth, Belmont (1984)MATHGoogle Scholar
  10. 10.
    Bühlmann, P., Yu, B.: Analyzing bagging. Annals of Statistics (2003)Google Scholar
  11. 11.
    Demšar, J.: Statistical comparisons of classifiers over multiple data sets. The Journal of Machine Learning Research 7, 1–30 (2006)Google Scholar
  12. 12.
    Dietterich, T.G., Bakiri, G.: Solving multiclass learning problems via error-correcting output codes. J. Artif. Intell. Res. (JAIR) 2, 263–286 (1995)MATHGoogle Scholar
  13. 13.
    Domingos, P.: Why does bagging work? A bayesian account and its implications. In: KDD, pp. 155–158 (1997)Google Scholar
  14. 14.
    Domingos, P., Hulten, G.: Mining high-speed data streams. In: KDD, pp. 71–80 (2000)Google Scholar
  15. 15.
    Friedman, J., Hall, P.: On bagging and nonlinear estimation. Technical report, Stanford University (1999)Google Scholar
  16. 16.
    Friedman, J.H.: On bias, variance, 0/1—loss, and the curse-of-dimensionality. Data Min. Knowl. Discov. 1(1), 55–77 (1997)CrossRefGoogle Scholar
  17. 17.
    Gama, J., Medas, P., Castillo, G., Rodrigues, P.P.: Learning with drift detection. In: Bazzan, A.L.C., Labidi, S. (eds.) SBIA 2004. LNCS (LNAI), vol. 3171, pp. 286–295. Springer, Heidelberg (2004)Google Scholar
  18. 18.
    Gama, J., Rocha, R., Medas, P.: Accurate decision trees for mining high-speed data streams. In: KDD, pp. 523–528 (2003)Google Scholar
  19. 19.
    Grandvalet, Y.: Bagging equalizes influence. Machine Learning 55(3), 251–270 (2004)MATHCrossRefGoogle Scholar
  20. 20.
    Harries, M.: Splice-2 comparative evaluation: Electricity pricing. Technical report, The University of South Wales (1999)Google Scholar
  21. 21.
    Holmes, G., Kirkby, R., Pfahringer, B.: Stress-testing Hoeffding trees. In: Jorge, A.M., Torgo, L., Brazdil, P.B., Camacho, R., Gama, J. (eds.) PKDD 2005. LNCS (LNAI), vol. 3721, pp. 495–502. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  22. 22.
    Hulten, G., Spencer, L., Domingos, P.: Mining time-changing data streams. In: KDD, pp. 97–106 (2001)Google Scholar
  23. 23.
    Lee, H.K.H., Clyde, M.A.: Lossless online bayesian bagging. J. Mach. Learn. Res. 5, 143–151 (2004)MathSciNetGoogle Scholar
  24. 24.
    Oza, N., Russell, S.: Online bagging and boosting. In: Artificial Intelligence and Statistics 2001, pp. 105–112. Morgan Kaufmann, San Francisco (2001)Google Scholar
  25. 25.
    Oza, N.C., Russell, S.J.: Experimental comparisons of online and batch versions of bagging and boosting. In: KDD, pp. 359–364 (2001)Google Scholar
  26. 26.
    Saffari, A., Leistner, C., Santner, J., Godec, M., Bischof, H.: On-line random forests. In: 3rd IEEE - ICCV Workshop on On-line Learning for Computer Vision (2009)Google Scholar
  27. 27.
    Schapire, R.E.: Using output codes to boost multiclass learning problems. In: ICML 1997: Proceedings of the Fourteenth International Conference on Machine Learning, pp. 313–321. Morgan Kaufmann Publishers Inc., San Francisco (1997)Google Scholar
  28. 28.
    Street, W.N., Kim, Y.: A streaming ensemble algorithm (SEA) for large-scale classification. In: KDD, pp. 377–382 (2001)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Albert Bifet
    • 1
  • Geoff Holmes
    • 1
  • Bernhard Pfahringer
    • 1
  1. 1.University of WaikatoHamiltonNew Zealand

Personalised recommendations