Springer Nature is making SARS-CoV-2 and COVID-19 research free. View research | View latest news | Sign up for updates

Snapshot boosting: a fast ensemble framework for deep neural networks

  • 27 Accesses

Abstract

Boosting has been proven to be effective in improving the generalization of machine learning models in many fields. It is capable of getting high-diversity base learners and getting an accurate ensemble model by combining a sufficient number of weak learners. However, it is rarely used in deep learning due to the high training budget of the neural network. Another method named snapshot ensemble can significantly reduce the training budget, but it is hard to balance the tradeoff between training costs and diversity. Inspired by the ideas of snapshot ensemble and boosting, we propose a method named snapshot boosting. A series of operations are performed to get many base models with high diversity and accuracy, such as the use of the validation set, the boosting-based training framework, and the effective ensemble strategy. Last, we evaluate our method on the computer vision (CV) and the natural language processing (NLP) tasks, and the results show that snapshot boosting can get a more balanced trade-off between training expenses and ensemble accuracy than other well-known ensemble methods.

References

  1. 1

    Liu L, Du X, Zhu L, et al. Learning discrete hashing towards efficient fashion recommendation. Data Sci Eng, 2018, 3: 307–322

  2. 2

    Abdelatti M, Yuan C Z, Zeng W, et al. Cooperative deterministic learning control for a group of homogeneous nonlinear uncertain robot manipulators. Sci China Inf Sci, 2018, 61: 112201

  3. 3

    Arun K S, Govindan V K. A hybrid deep learning architecture for latent topic-based image retrieval. Data Sci Eng, 2018, 3: 166–195

  4. 4

    Zhang C, Bengio S, Hardt M, et al. Understanding deep learning requires rethinking generalization. 2016. ArXiv: 1611.03530

  5. 5

    Opitz D, Maclin R. Popular ensemble methods: an empirical study. J Artif Intell Res, 1999, 11: 169–198

  6. 6

    Melville P, Mooney R J. Creating diversity in ensembles using artificial data. Inf Fusion, 2005, 6: 99–111

  7. 7

    Jiang J, Cui B, Zhang C, et al. DimBoost: boosting gradient boosting decision tree to higher dimensions. In: Proceedings of the 2018 International Conference on Management of Data. New York: ACM, 2018. 1363–1376

  8. 8

    Gao W, Zhou Z H. On the doubt about margin explanation of boosting. Artif Intell, 2013, 203: 1–18

  9. 9

    Mosca A, Magoulas G D. Deep incremental boosting. 2017. ArXiv: 1708.03704

  10. 10

    Quinlan J R. Bagging, boosting, and C4. 5. In: Proceedings of the 13th National Conference on Artificial Intelligence and 8th Innovative Applications of Artificial Intelligence Conference, Portland, 1996. 725–730

  11. 11

    Huang G, Li Y, Pleiss G, et al. Snapshot ensembles: train 1, get M for free. 2017. ArXiv: 1704.00109

  12. 12

    Loshchilov I, Hutter F. Sgdr: stochastic gradient descent with warm restarts. 2016. ArXiv: 1608.03983

  13. 13

    Zhou Z H. Ensemble methods: foundations and algorithms. Chapman and Hall/CRC, 2012

  14. 14

    LeCun Y, Bengio Y, Hinton G. Deep learning. Nature, 2015, 521: 436–444

  15. 15

    Dietterich T G. Ensemble methods in machine learning. In: Proceedings of the International Workshop on multiple Classifier Systems. Berlin: Springer, 2000. 1–15

  16. 16

    Naftaly U, Intrator N, Horn D. Optimal ensemble averaging of neural networks. Netw-Comput Neural Syst, 1997, 8: 283–296

  17. 17

    He K, Zhang X, Ren S, et al. Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016. 770–778

  18. 18

    Friedman J, Hastie T, Tibshirani R. The Elements of Statistical Learning. New York: Springer, 2001

  19. 19

    Schwenk H, Bengio Y. Training methods for adaptive boosting of neural networks. In: Proceedings of the Advances in Neural Information Processing Systems, 1998. 647–653

  20. 20

    Bucilu C, Caruana R, Niculescu-Mizil A. Model compression. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM, 2006. 535–541

  21. 21

    Hinton G, Vinyals O, Dean J. Distilling the knowledge in a neural network. 2015. ArXiv: 1503.02531

  22. 22

    Breiman L. Stacked regressions. Mach Learn, 1996, 24: 49–64

  23. 23

    van der Laan M J, Polley E C, Hubbard A E. Super learner. Stat Appl Genets Mol Biol, 2007, 6: 1

  24. 24

    Young S, Abdou T, Bener A. Deep super learner: a deep ensemble for classification problems. In: Proceedings of the 31st Canadian Conference on Artificial Intelligence, Toronto, 2018. 84–95

  25. 25

    Ju C, Bibaut A, van der Laan M. The relative performance of ensemble methods with deep convolutional neural networks for image classification. J Appl Stat, 2018, 45: 2800–2818

  26. 26

    Seyyedsalehi S Z, Seyyedsalehi S A. A fast and efficient pre-training method based on layer-by-layer maximum discrimination for deep neural networks. Neurocomputing, 2015, 168: 669–680

  27. 27

    Zhou Z H, Wu J, Tang W. Ensembling neural networks: many could be better than all. Artif Intell, 2002, 137: 239–263

  28. 28

    Aho K, Derryberry D W, Peterson T. Model selection for ecologists: the worldviews of AIC and BIC. Ecology, 2014, 95: 631–636

  29. 29

    Kohavi R. A study of cross-validation and bootstrap for accuracy estimation and model selection. In: Proceedings of the 14th International Joint Conference on Artificial Intelligence, Montreal, 1995. 1137–1145

  30. 30

    Brownlee J. Discover feature engineering, how to engineer features and how to get good at it. Machine Learning Process, 2014

  31. 31

    Guyon I, Elisseeff A. An introduction to variable and feature selection. J Mach Learn Res, 2003, 3: 1157–1182

  32. 32

    Huang G, Liu Z, van der Maaten L, et al. Densely connected convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017. 4700–4708

  33. 33

    Sutskever I, Vinyals O, Le Q V. Sequence to sequence learning with neural networks. In: Proceedings of Advances in Neural Information Processing Systems, 2014. 3104–3112

  34. 34

    Krizhevsky A, Hinton G. Learning Multiple Layers of Features From Tiny Images. Technical Report, University of Toronto, 2009

  35. 35

    Lin M, Chen Q, Yan S. Network in network. 2013. ArXiv: 1312.4400

  36. 36

    Maas A L, Daly R E, Pham P T, et al. Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, 2011. 142–150

  37. 37

    Freund Y, Schapire R E. Experiments with a new boosting algorithm. In: Proceedings of the 13th International Conference on ML, 1996. 148–156

Download references

Acknowledgements

This work was supported by National Natural Science Foundation of China (Grant Nos. 61832001, 61702015, 61702016, 61572039), National Key Research and Development Program of China (Grant No. 2018YFB1004403), and PKU-Tencent Joint Research Lab.

Author information

Correspondence to Jiawei Jiang.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Zhang, W., Jiang, J., Shao, Y. et al. Snapshot boosting: a fast ensemble framework for deep neural networks. Sci. China Inf. Sci. 63, 112102 (2020). https://doi.org/10.1007/s11432-018-9944-x

Download citation

Keywords

  • ensemble learning
  • deep learning
  • boosting
  • neural network
  • snapshot ensemble