Abstract
Boosting has been proven to be effective in improving the generalization of machine learning models in many fields. It is capable of getting high-diversity base learners and getting an accurate ensemble model by combining a sufficient number of weak learners. However, it is rarely used in deep learning due to the high training budget of the neural network. Another method named snapshot ensemble can significantly reduce the training budget, but it is hard to balance the tradeoff between training costs and diversity. Inspired by the ideas of snapshot ensemble and boosting, we propose a method named snapshot boosting. A series of operations are performed to get many base models with high diversity and accuracy, such as the use of the validation set, the boosting-based training framework, and the effective ensemble strategy. Last, we evaluate our method on the computer vision (CV) and the natural language processing (NLP) tasks, and the results show that snapshot boosting can get a more balanced trade-off between training expenses and ensemble accuracy than other well-known ensemble methods.
This is a preview of subscription content, access via your institution.
References
- 1
Liu L, Du X, Zhu L, et al. Learning discrete hashing towards efficient fashion recommendation. Data Sci Eng, 2018, 3: 307–322
- 2
Abdelatti M, Yuan C Z, Zeng W, et al. Cooperative deterministic learning control for a group of homogeneous nonlinear uncertain robot manipulators. Sci China Inf Sci, 2018, 61: 112201
- 3
Arun K S, Govindan V K. A hybrid deep learning architecture for latent topic-based image retrieval. Data Sci Eng, 2018, 3: 166–195
- 4
Zhang C, Bengio S, Hardt M, et al. Understanding deep learning requires rethinking generalization. 2016. ArXiv: 1611.03530
- 5
Opitz D, Maclin R. Popular ensemble methods: an empirical study. J Artif Intell Res, 1999, 11: 169–198
- 6
Melville P, Mooney R J. Creating diversity in ensembles using artificial data. Inf Fusion, 2005, 6: 99–111
- 7
Jiang J, Cui B, Zhang C, et al. DimBoost: boosting gradient boosting decision tree to higher dimensions. In: Proceedings of the 2018 International Conference on Management of Data. New York: ACM, 2018. 1363–1376
- 8
Gao W, Zhou Z H. On the doubt about margin explanation of boosting. Artif Intell, 2013, 203: 1–18
- 9
Mosca A, Magoulas G D. Deep incremental boosting. 2017. ArXiv: 1708.03704
- 10
Quinlan J R. Bagging, boosting, and C4. 5. In: Proceedings of the 13th National Conference on Artificial Intelligence and 8th Innovative Applications of Artificial Intelligence Conference, Portland, 1996. 725–730
- 11
Huang G, Li Y, Pleiss G, et al. Snapshot ensembles: train 1, get M for free. 2017. ArXiv: 1704.00109
- 12
Loshchilov I, Hutter F. Sgdr: stochastic gradient descent with warm restarts. 2016. ArXiv: 1608.03983
- 13
Zhou Z H. Ensemble methods: foundations and algorithms. Chapman and Hall/CRC, 2012
- 14
LeCun Y, Bengio Y, Hinton G. Deep learning. Nature, 2015, 521: 436–444
- 15
Dietterich T G. Ensemble methods in machine learning. In: Proceedings of the International Workshop on multiple Classifier Systems. Berlin: Springer, 2000. 1–15
- 16
Naftaly U, Intrator N, Horn D. Optimal ensemble averaging of neural networks. Netw-Comput Neural Syst, 1997, 8: 283–296
- 17
He K, Zhang X, Ren S, et al. Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016. 770–778
- 18
Friedman J, Hastie T, Tibshirani R. The Elements of Statistical Learning. New York: Springer, 2001
- 19
Schwenk H, Bengio Y. Training methods for adaptive boosting of neural networks. In: Proceedings of the Advances in Neural Information Processing Systems, 1998. 647–653
- 20
Bucilu C, Caruana R, Niculescu-Mizil A. Model compression. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM, 2006. 535–541
- 21
Hinton G, Vinyals O, Dean J. Distilling the knowledge in a neural network. 2015. ArXiv: 1503.02531
- 22
Breiman L. Stacked regressions. Mach Learn, 1996, 24: 49–64
- 23
van der Laan M J, Polley E C, Hubbard A E. Super learner. Stat Appl Genets Mol Biol, 2007, 6: 1
- 24
Young S, Abdou T, Bener A. Deep super learner: a deep ensemble for classification problems. In: Proceedings of the 31st Canadian Conference on Artificial Intelligence, Toronto, 2018. 84–95
- 25
Ju C, Bibaut A, van der Laan M. The relative performance of ensemble methods with deep convolutional neural networks for image classification. J Appl Stat, 2018, 45: 2800–2818
- 26
Seyyedsalehi S Z, Seyyedsalehi S A. A fast and efficient pre-training method based on layer-by-layer maximum discrimination for deep neural networks. Neurocomputing, 2015, 168: 669–680
- 27
Zhou Z H, Wu J, Tang W. Ensembling neural networks: many could be better than all. Artif Intell, 2002, 137: 239–263
- 28
Aho K, Derryberry D W, Peterson T. Model selection for ecologists: the worldviews of AIC and BIC. Ecology, 2014, 95: 631–636
- 29
Kohavi R. A study of cross-validation and bootstrap for accuracy estimation and model selection. In: Proceedings of the 14th International Joint Conference on Artificial Intelligence, Montreal, 1995. 1137–1145
- 30
Brownlee J. Discover feature engineering, how to engineer features and how to get good at it. Machine Learning Process, 2014
- 31
Guyon I, Elisseeff A. An introduction to variable and feature selection. J Mach Learn Res, 2003, 3: 1157–1182
- 32
Huang G, Liu Z, van der Maaten L, et al. Densely connected convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017. 4700–4708
- 33
Sutskever I, Vinyals O, Le Q V. Sequence to sequence learning with neural networks. In: Proceedings of Advances in Neural Information Processing Systems, 2014. 3104–3112
- 34
Krizhevsky A, Hinton G. Learning Multiple Layers of Features From Tiny Images. Technical Report, University of Toronto, 2009
- 35
Lin M, Chen Q, Yan S. Network in network. 2013. ArXiv: 1312.4400
- 36
Maas A L, Daly R E, Pham P T, et al. Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, 2011. 142–150
- 37
Freund Y, Schapire R E. Experiments with a new boosting algorithm. In: Proceedings of the 13th International Conference on ML, 1996. 148–156
Acknowledgements
This work was supported by National Natural Science Foundation of China (Grant Nos. 61832001, 61702015, 61702016, 61572039), National Key Research and Development Program of China (Grant No. 2018YFB1004403), and PKU-Tencent Joint Research Lab.
Author information
Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Zhang, W., Jiang, J., Shao, Y. et al. Snapshot boosting: a fast ensemble framework for deep neural networks. Sci. China Inf. Sci. 63, 112102 (2020). https://doi.org/10.1007/s11432-018-9944-x
Received:
Revised:
Accepted:
Published:
Keywords
- ensemble learning
- deep learning
- boosting
- neural network
- snapshot ensemble