Abstract
Boosting has been proven to be effective in improving the generalization of machine learning models in many fields. It is capable of getting high-diversity base learners and getting an accurate ensemble model by combining a sufficient number of weak learners. However, it is rarely used in deep learning due to the high training budget of the neural network. Another method named snapshot ensemble can significantly reduce the training budget, but it is hard to balance the tradeoff between training costs and diversity. Inspired by the ideas of snapshot ensemble and boosting, we propose a method named snapshot boosting. A series of operations are performed to get many base models with high diversity and accuracy, such as the use of the validation set, the boosting-based training framework, and the effective ensemble strategy. Last, we evaluate our method on the computer vision (CV) and the natural language processing (NLP) tasks, and the results show that snapshot boosting can get a more balanced trade-off between training expenses and ensemble accuracy than other well-known ensemble methods.
Similar content being viewed by others
References
Liu L, Du X, Zhu L, et al. Learning discrete hashing towards efficient fashion recommendation. Data Sci Eng, 2018, 3: 307–322
Abdelatti M, Yuan C Z, Zeng W, et al. Cooperative deterministic learning control for a group of homogeneous nonlinear uncertain robot manipulators. Sci China Inf Sci, 2018, 61: 112201
Arun K S, Govindan V K. A hybrid deep learning architecture for latent topic-based image retrieval. Data Sci Eng, 2018, 3: 166–195
Zhang C, Bengio S, Hardt M, et al. Understanding deep learning requires rethinking generalization. 2016. ArXiv: 1611.03530
Opitz D, Maclin R. Popular ensemble methods: an empirical study. J Artif Intell Res, 1999, 11: 169–198
Melville P, Mooney R J. Creating diversity in ensembles using artificial data. Inf Fusion, 2005, 6: 99–111
Jiang J, Cui B, Zhang C, et al. DimBoost: boosting gradient boosting decision tree to higher dimensions. In: Proceedings of the 2018 International Conference on Management of Data. New York: ACM, 2018. 1363–1376
Gao W, Zhou Z H. On the doubt about margin explanation of boosting. Artif Intell, 2013, 203: 1–18
Mosca A, Magoulas G D. Deep incremental boosting. 2017. ArXiv: 1708.03704
Quinlan J R. Bagging, boosting, and C4. 5. In: Proceedings of the 13th National Conference on Artificial Intelligence and 8th Innovative Applications of Artificial Intelligence Conference, Portland, 1996. 725–730
Huang G, Li Y, Pleiss G, et al. Snapshot ensembles: train 1, get M for free. 2017. ArXiv: 1704.00109
Loshchilov I, Hutter F. Sgdr: stochastic gradient descent with warm restarts. 2016. ArXiv: 1608.03983
Zhou Z H. Ensemble methods: foundations and algorithms. Chapman and Hall/CRC, 2012
LeCun Y, Bengio Y, Hinton G. Deep learning. Nature, 2015, 521: 436–444
Dietterich T G. Ensemble methods in machine learning. In: Proceedings of the International Workshop on multiple Classifier Systems. Berlin: Springer, 2000. 1–15
Naftaly U, Intrator N, Horn D. Optimal ensemble averaging of neural networks. Netw-Comput Neural Syst, 1997, 8: 283–296
He K, Zhang X, Ren S, et al. Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016. 770–778
Friedman J, Hastie T, Tibshirani R. The Elements of Statistical Learning. New York: Springer, 2001
Schwenk H, Bengio Y. Training methods for adaptive boosting of neural networks. In: Proceedings of the Advances in Neural Information Processing Systems, 1998. 647–653
Bucilu C, Caruana R, Niculescu-Mizil A. Model compression. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM, 2006. 535–541
Hinton G, Vinyals O, Dean J. Distilling the knowledge in a neural network. 2015. ArXiv: 1503.02531
Breiman L. Stacked regressions. Mach Learn, 1996, 24: 49–64
van der Laan M J, Polley E C, Hubbard A E. Super learner. Stat Appl Genets Mol Biol, 2007, 6: 1
Young S, Abdou T, Bener A. Deep super learner: a deep ensemble for classification problems. In: Proceedings of the 31st Canadian Conference on Artificial Intelligence, Toronto, 2018. 84–95
Ju C, Bibaut A, van der Laan M. The relative performance of ensemble methods with deep convolutional neural networks for image classification. J Appl Stat, 2018, 45: 2800–2818
Seyyedsalehi S Z, Seyyedsalehi S A. A fast and efficient pre-training method based on layer-by-layer maximum discrimination for deep neural networks. Neurocomputing, 2015, 168: 669–680
Zhou Z H, Wu J, Tang W. Ensembling neural networks: many could be better than all. Artif Intell, 2002, 137: 239–263
Aho K, Derryberry D W, Peterson T. Model selection for ecologists: the worldviews of AIC and BIC. Ecology, 2014, 95: 631–636
Kohavi R. A study of cross-validation and bootstrap for accuracy estimation and model selection. In: Proceedings of the 14th International Joint Conference on Artificial Intelligence, Montreal, 1995. 1137–1145
Brownlee J. Discover feature engineering, how to engineer features and how to get good at it. Machine Learning Process, 2014
Guyon I, Elisseeff A. An introduction to variable and feature selection. J Mach Learn Res, 2003, 3: 1157–1182
Huang G, Liu Z, van der Maaten L, et al. Densely connected convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017. 4700–4708
Sutskever I, Vinyals O, Le Q V. Sequence to sequence learning with neural networks. In: Proceedings of Advances in Neural Information Processing Systems, 2014. 3104–3112
Krizhevsky A, Hinton G. Learning Multiple Layers of Features From Tiny Images. Technical Report, University of Toronto, 2009
Lin M, Chen Q, Yan S. Network in network. 2013. ArXiv: 1312.4400
Maas A L, Daly R E, Pham P T, et al. Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, 2011. 142–150
Freund Y, Schapire R E. Experiments with a new boosting algorithm. In: Proceedings of the 13th International Conference on ML, 1996. 148–156
Acknowledgements
This work was supported by National Natural Science Foundation of China (Grant Nos. 61832001, 61702015, 61702016, 61572039), National Key Research and Development Program of China (Grant No. 2018YFB1004403), and PKU-Tencent Joint Research Lab.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Zhang, W., Jiang, J., Shao, Y. et al. Snapshot boosting: a fast ensemble framework for deep neural networks. Sci. China Inf. Sci. 63, 112102 (2020). https://doi.org/10.1007/s11432-018-9944-x
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11432-018-9944-x