Advertisement

Algorithm Selection on Data Streams

  • Jan N. van Rijn
  • Geoffrey Holmes
  • Bernhard Pfahringer
  • Joaquin Vanschoren
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8777)

Abstract

We explore the possibilities of meta-learning on data streams, in particular algorithm selection. In a first experiment we calculate the characteristics of a small sample of a data stream, and try to predict which classifier performs best on the entire stream. This yields promising results and interesting patterns. In a second experiment, we build a meta-classifier that predicts, based on measurable data characteristics in a window of the data stream, the best classifier for the next window. The results show that this meta-algorithm is very competitive with state of the art ensembles, such as OzaBag, OzaBoost and Leveraged Bagging. The results of all experiments are made publicly available in an online experiment database, for the purpose of verifiability, reproducibility and generalizability.

Keywords

Meta Learning Data Stream Mining 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Bache, K., Lichman, M.: UCI machine learning repository (2013), http://archive.ics.uci.edu/ml
  2. 2.
    Bifet, A., Gavalda, R.: Learning from Time-Changing Data with Adaptive Windowing. In: SDM, vol. 7, pp. 139–148. SIAM (2007)Google Scholar
  3. 3.
    Bifet, A., Holmes, G., Kirkby, R., Pfahringer, B.: MOA: Massive Online Analysis. J. Mach. Learn. Res. 11, 1601–1604 (2010)Google Scholar
  4. 4.
    Bifet, A., Holmes, G., Pfahringer, B.: Leveraging Bagging for Evolving Data Streams. In: Balcázar, J.L., Bonchi, F., Gionis, A., Sebag, M. (eds.) ECML PKDD 2010, Part I. LNCS, vol. 6321, pp. 135–150. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  5. 5.
    Breiman, L.: Bagging Predictors. Machine Learning 24(2), 123–140 (1996)MathSciNetMATHGoogle Scholar
  6. 6.
    Domingos, P., Hulten, G.: Mining high-speed data streams. In: Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 71–80 (2000)Google Scholar
  7. 7.
    Gama, J., Brazdil, P.: Cascade Generalization. Machine Learning 41(3), 315–343 (2000)CrossRefMATHGoogle Scholar
  8. 8.
    Gama, J., Medas, P., Castillo, G., Rodrigues, P.: Learning with Drift Detection. In: Bazzan, A.L.C., Labidi, S. (eds.) SBIA 2004. LNCS (LNAI), vol. 3171, pp. 286–295. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  9. 9.
    Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA Data Mining Software: An Update. ACM SIGKDD Explorations Newsletter 11(1), 10–18 (2009)CrossRefGoogle Scholar
  10. 10.
    Hansen, L., Salamon, P.: Neural network ensembles. IEEE Transactions on Pattern Analysis and Machine Intelligence 12(10), 993–1001 (1990)CrossRefGoogle Scholar
  11. 11.
    Oza, N.C.: Online Bagging and Boosting. In: 2005 IEEE International Conference on Systems, Man and Cybernetics, vol. 3, pp. 2340–2345. IEEE (2005)Google Scholar
  12. 12.
    Pfahringer, B., Bensusan, H., Giraud-Carrier, C.: Tell me who can learn you and I can tell you who you are: Landmarking various learning algorithms. In: Proceedings of the 17th International Conference on Machine Learning, pp. 743–750 (2000)Google Scholar
  13. 13.
    Read, J., Bifet, A., Pfahringer, B., Holmes, G.: Batch-Incremental versus Instance-Incremental Learning in Dynamic and Evolving Data. In: Hollmén, J., Klawonn, F., Tucker, A. (eds.) IDA 2012. LNCS, vol. 7619, pp. 313–323. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  14. 14.
    Rice, J.R.: The Algorithm Selection Problem. Advances in Computers 15, 65–118 (1976)CrossRefGoogle Scholar
  15. 15.
    van Rijn, J.N., et al.: OpenML: A Collaborative Science Platform. In: Blockeel, H., Kersting, K., Nijssen, S., Železný, F. (eds.) ECML PKDD 2013, Part III. LNCS, vol. 8190, pp. 645–649. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  16. 16.
    van Rijn, J.N., Holmes, G., Pfahringer, B., Vanschoren, J.: The Bayesian Network Generator: A data stream generator. Tech. Rep. 03/2014, Computer Science Department, University of Waikato (2014)Google Scholar
  17. 17.
    Schapire, R.E.: The Strength of Weak Learnability. Machine Learning 5(2), 197–227 (1990)Google Scholar
  18. 18.
    Sun, Q., Pfahringer, B.: Pairwise meta-rules for better meta-learning-based algorithm ranking. Machine Learning 93(1), 141–161 (2013)MathSciNetCrossRefMATHGoogle Scholar
  19. 19.
    Vanschoren, J., Blockeel, H., Pfahringer, B., Holmes, G.: Experiment databases. A new way to share, organize and learn from experiments. Machine Learning 87(2), 127–158 (2012)MathSciNetCrossRefMATHGoogle Scholar
  20. 20.
    Wang, H., Fan, W., Yu, P.S., Han, J.: Mining Concept-Drifting Data Streams using Ensemble Classifiers. In: KDD, pp. 226–235 (2003)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Jan N. van Rijn
    • 1
  • Geoffrey Holmes
    • 2
  • Bernhard Pfahringer
    • 2
  • Joaquin Vanschoren
    • 3
  1. 1.Leiden UniversityLeidenNetherlands
  2. 2.University of WaikatoHamiltonNew Zealand
  3. 3.Eindhoven University of TechnologyEindhovenNetherlands

Personalised recommendations