Snapshot boosting: a fast ensemble framework for deep neural networks


Boosting has been proven to be effective in improving the generalization of machine learning models in many fields. It is capable of getting high-diversity base learners and getting an accurate ensemble model by combining a sufficient number of weak learners. However, it is rarely used in deep learning due to the high training budget of the neural network. Another method named snapshot ensemble can significantly reduce the training budget, but it is hard to balance the tradeoff between training costs and diversity. Inspired by the ideas of snapshot ensemble and boosting, we propose a method named snapshot boosting. A series of operations are performed to get many base models with high diversity and accuracy, such as the use of the validation set, the boosting-based training framework, and the effective ensemble strategy. Last, we evaluate our method on the computer vision (CV) and the natural language processing (NLP) tasks, and the results show that snapshot boosting can get a more balanced trade-off between training expenses and ensemble accuracy than other well-known ensemble methods.

This is a preview of subscription content, access via your institution.


  1. 1

    Liu L, Du X, Zhu L, et al. Learning discrete hashing towards efficient fashion recommendation. Data Sci Eng, 2018, 3: 307–322

    Article  Google Scholar 

  2. 2

    Abdelatti M, Yuan C Z, Zeng W, et al. Cooperative deterministic learning control for a group of homogeneous nonlinear uncertain robot manipulators. Sci China Inf Sci, 2018, 61: 112201

    MathSciNet  Article  Google Scholar 

  3. 3

    Arun K S, Govindan V K. A hybrid deep learning architecture for latent topic-based image retrieval. Data Sci Eng, 2018, 3: 166–195

    Article  Google Scholar 

  4. 4

    Zhang C, Bengio S, Hardt M, et al. Understanding deep learning requires rethinking generalization. 2016. ArXiv: 1611.03530

    Google Scholar 

  5. 5

    Opitz D, Maclin R. Popular ensemble methods: an empirical study. J Artif Intell Res, 1999, 11: 169–198

    Article  Google Scholar 

  6. 6

    Melville P, Mooney R J. Creating diversity in ensembles using artificial data. Inf Fusion, 2005, 6: 99–111

    Article  Google Scholar 

  7. 7

    Jiang J, Cui B, Zhang C, et al. DimBoost: boosting gradient boosting decision tree to higher dimensions. In: Proceedings of the 2018 International Conference on Management of Data. New York: ACM, 2018. 1363–1376

    Google Scholar 

  8. 8

    Gao W, Zhou Z H. On the doubt about margin explanation of boosting. Artif Intell, 2013, 203: 1–18

    MathSciNet  Article  Google Scholar 

  9. 9

    Mosca A, Magoulas G D. Deep incremental boosting. 2017. ArXiv: 1708.03704

    Google Scholar 

  10. 10

    Quinlan J R. Bagging, boosting, and C4. 5. In: Proceedings of the 13th National Conference on Artificial Intelligence and 8th Innovative Applications of Artificial Intelligence Conference, Portland, 1996. 725–730

    Google Scholar 

  11. 11

    Huang G, Li Y, Pleiss G, et al. Snapshot ensembles: train 1, get M for free. 2017. ArXiv: 1704.00109

    Google Scholar 

  12. 12

    Loshchilov I, Hutter F. Sgdr: stochastic gradient descent with warm restarts. 2016. ArXiv: 1608.03983

    Google Scholar 

  13. 13

    Zhou Z H. Ensemble methods: foundations and algorithms. Chapman and Hall/CRC, 2012

    Google Scholar 

  14. 14

    LeCun Y, Bengio Y, Hinton G. Deep learning. Nature, 2015, 521: 436–444

    Article  Google Scholar 

  15. 15

    Dietterich T G. Ensemble methods in machine learning. In: Proceedings of the International Workshop on multiple Classifier Systems. Berlin: Springer, 2000. 1–15

    Google Scholar 

  16. 16

    Naftaly U, Intrator N, Horn D. Optimal ensemble averaging of neural networks. Netw-Comput Neural Syst, 1997, 8: 283–296

    Article  Google Scholar 

  17. 17

    He K, Zhang X, Ren S, et al. Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016. 770–778

    Google Scholar 

  18. 18

    Friedman J, Hastie T, Tibshirani R. The Elements of Statistical Learning. New York: Springer, 2001

    Google Scholar 

  19. 19

    Schwenk H, Bengio Y. Training methods for adaptive boosting of neural networks. In: Proceedings of the Advances in Neural Information Processing Systems, 1998. 647–653

    Google Scholar 

  20. 20

    Bucilu C, Caruana R, Niculescu-Mizil A. Model compression. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM, 2006. 535–541

    Google Scholar 

  21. 21

    Hinton G, Vinyals O, Dean J. Distilling the knowledge in a neural network. 2015. ArXiv: 1503.02531

    Google Scholar 

  22. 22

    Breiman L. Stacked regressions. Mach Learn, 1996, 24: 49–64

    MATH  Google Scholar 

  23. 23

    van der Laan M J, Polley E C, Hubbard A E. Super learner. Stat Appl Genets Mol Biol, 2007, 6: 1

    MathSciNet  MATH  Google Scholar 

  24. 24

    Young S, Abdou T, Bener A. Deep super learner: a deep ensemble for classification problems. In: Proceedings of the 31st Canadian Conference on Artificial Intelligence, Toronto, 2018. 84–95

    Google Scholar 

  25. 25

    Ju C, Bibaut A, van der Laan M. The relative performance of ensemble methods with deep convolutional neural networks for image classification. J Appl Stat, 2018, 45: 2800–2818

    MathSciNet  Article  Google Scholar 

  26. 26

    Seyyedsalehi S Z, Seyyedsalehi S A. A fast and efficient pre-training method based on layer-by-layer maximum discrimination for deep neural networks. Neurocomputing, 2015, 168: 669–680

    Article  Google Scholar 

  27. 27

    Zhou Z H, Wu J, Tang W. Ensembling neural networks: many could be better than all. Artif Intell, 2002, 137: 239–263

    MathSciNet  Article  Google Scholar 

  28. 28

    Aho K, Derryberry D W, Peterson T. Model selection for ecologists: the worldviews of AIC and BIC. Ecology, 2014, 95: 631–636

    Article  Google Scholar 

  29. 29

    Kohavi R. A study of cross-validation and bootstrap for accuracy estimation and model selection. In: Proceedings of the 14th International Joint Conference on Artificial Intelligence, Montreal, 1995. 1137–1145

    Google Scholar 

  30. 30

    Brownlee J. Discover feature engineering, how to engineer features and how to get good at it. Machine Learning Process, 2014

    Google Scholar 

  31. 31

    Guyon I, Elisseeff A. An introduction to variable and feature selection. J Mach Learn Res, 2003, 3: 1157–1182

    MATH  Google Scholar 

  32. 32

    Huang G, Liu Z, van der Maaten L, et al. Densely connected convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017. 4700–4708

    Google Scholar 

  33. 33

    Sutskever I, Vinyals O, Le Q V. Sequence to sequence learning with neural networks. In: Proceedings of Advances in Neural Information Processing Systems, 2014. 3104–3112

    Google Scholar 

  34. 34

    Krizhevsky A, Hinton G. Learning Multiple Layers of Features From Tiny Images. Technical Report, University of Toronto, 2009

    Google Scholar 

  35. 35

    Lin M, Chen Q, Yan S. Network in network. 2013. ArXiv: 1312.4400

    Google Scholar 

  36. 36

    Maas A L, Daly R E, Pham P T, et al. Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, 2011. 142–150

    Google Scholar 

  37. 37

    Freund Y, Schapire R E. Experiments with a new boosting algorithm. In: Proceedings of the 13th International Conference on ML, 1996. 148–156

    Google Scholar 

Download references


This work was supported by National Natural Science Foundation of China (Grant Nos. 61832001, 61702015, 61702016, 61572039), National Key Research and Development Program of China (Grant No. 2018YFB1004403), and PKU-Tencent Joint Research Lab.

Author information



Corresponding author

Correspondence to Jiawei Jiang.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Zhang, W., Jiang, J., Shao, Y. et al. Snapshot boosting: a fast ensemble framework for deep neural networks. Sci. China Inf. Sci. 63, 112102 (2020).

Download citation


  • ensemble learning
  • deep learning
  • boosting
  • neural network
  • snapshot ensemble