Skip to main content

Ensembles on Random Patches

  • Conference paper

Part of the Lecture Notes in Computer Science book series (LNAI,volume 7523)

Abstract

In this paper, we consider supervised learning under the assumption that the available memory is small compared to the dataset size. This general framework is relevant in the context of big data, distributed databases and embedded systems. We investigate a very simple, yet effective, ensemble framework that builds each individual model of the ensemble from a random patch of data obtained by drawing random subsets of both instances and features from the whole dataset. We carry out an extensive and systematic evaluation of this method on 29 datasets, using decision tree-based estimators. With respect to popular ensemble methods, these experiments show that the proposed method provides on par performance in terms of accuracy while simultaneously lowering the memory needs, and attains significantly better performance when memory is severely constrained.

Keywords

  • Memory Requirement
  • Average Rank
  • Ensemble Method
  • Base Estimator
  • Random Subspace

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. Breiman, L.: Pasting small votes for classification in large databases and on-line. Machine Learning 36(1), 85–103 (1999)

    CrossRef  Google Scholar 

  2. Ho, T.: The random subspace method for constructing decision forests. IEEE Transactions on Pattern Analysis and Machine Intelligence 20(8), 832–844 (1998)

    CrossRef  Google Scholar 

  3. Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J.: Classification and regression trees (1984)

    Google Scholar 

  4. Breiman, L.: Bagging predictors. Machine learning 24(2), 123–140 (1996)

    MathSciNet  MATH  Google Scholar 

  5. Breiman, L.: Random forests. Machine learning 45(1), 5–32 (2001)

    MATH  CrossRef  Google Scholar 

  6. Geurts, P., Ernst, D., Wehenkel, L.: Extremely randomized trees. Machine Learning 63(1), 3–42 (2006)

    MATH  CrossRef  Google Scholar 

  7. Chawla, N.V., Hall, L.O., Bowyer, K.W., Kegelmeyer, W.P.: Learning ensembles from bites: A scalable and accurate approach. J. Mach. Learn. Res. 5, 421–451 (2004)

    MathSciNet  Google Scholar 

  8. Basilico, J., Munson, M., Kolda, T., Dixon, K., Kegelmeyer, W.: Comet: A recipe for learning and using large ensembles on massive data. In: IEEE 11th International Conference on Data Mining (ICDM), pp. 41–50. IEEE (2011)

    Google Scholar 

  9. Panov, P., Džeroski, S.: Combining Bagging and Random Subspaces to Create Better Ensembles. In: Berthold, M., Shawe-Taylor, J., Lavrač, N. (eds.) IDA 2007. LNCS, vol. 4723, pp. 118–129. Springer, Heidelberg (2007)

    CrossRef  Google Scholar 

  10. Pedregosa, F., et al.: Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research 12, 2825–2830 (2011)

    MathSciNet  Google Scholar 

  11. Frank, A., Asuncion, A.: UCI machine learning repository (2010)

    Google Scholar 

  12. Demsar, J.: Statistical comparisons of classifiers over multiple data sets. The Journal of Machine Learning Research 7, 1–30 (2006)

    MathSciNet  MATH  Google Scholar 

  13. Zinkevich, M., Weimer, M., Smola, A., Li, L.: Parallelized stochastic gradient descent. In: Lafferty, J., Williams, C.K.I., Shawe-Taylor, J., Zemel, R., Culotta, A. (eds.) Advances in Neural Information Processing Systems, vol. 23, pp. 2595–2603 (2010)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Louppe, G., Geurts, P. (2012). Ensembles on Random Patches. In: Flach, P.A., De Bie, T., Cristianini, N. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2012. Lecture Notes in Computer Science(), vol 7523. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33460-3_28

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-33460-3_28

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-33459-7

  • Online ISBN: 978-3-642-33460-3

  • eBook Packages: Computer ScienceComputer Science (R0)