Abstract
In this paper, an efficient ensemble sparse Convolutional Neural Networks (CNNs) with dynamic batch size is proposed. We addressed two issues at the heart of deep learning—speed and accuracy. Firstly, we presented ensemble CNNs with weighted average stacking which significantly increases the testing accuracy. Secondly, we combine network pruning and Winograd-ReLU convolution to accelerate computational speed. Motivated by electron movement in electrical fields, we finally propose a novel, dynamic batch size algorithm. We repeatedly increase the learning rate and the momentum coefficient until validation accuracy falls, while scaling the batch size. With no data augmentation and little hyperparameter tuning, our method speeds up models on FASHION-MINST, CIFAR-10, and CIFAR-100 to 1.55x, 2.86x, and 4.15x with a testing accuracy improvement of 2.66%, 1.37%, and 4.48%, respectively. We also visually demonstrate that our approach retains the most distinct image classification features during exhaustive pruning.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Lecun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. Commun. ACM 60(6), 84–90 (2017)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition (2014)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition (2015)
Mathieu, M., Henaff, M., LeCun, Y.: Fast training of convolutional networks through FFTs (2013)
Lavin, A., Gray, S.: Fast algorithms for convolutional neural networks (2015)
Winograd, S.: Arithmetic Complexity of Computations. Society for Industrial and Applied Mathematics (1980)
Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift (2015)
Lu, L., Liang, Y.: SpWA: an efficient sparse winograd convolutional neural networks accelerator on FPGAs. In: 2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC), pp. 1–6 (2018)
Liu, X.: Pruning of winograd and FFT based convolution algorithm (2016)
Li, S., Park, J., Tang, P.T.P.: Enabling sparse winograd convolution by native pruning (2017)
Liu, X., Pool, J., Han, S., Dally, W.J.: Efficient sparse-winograd convolutional neural networks (2018)
Nair, V., Hinton, G.E.: Rectified linear units improve restricted Boltzmann machines. In: Proceedings of the 27th International Conference on International Conference on Machine Learning, ICML 2010, Madison, WI, USA, pp. 807–814. Omnipress (2010)
Maas, A.L.: Rectifier nonlinearities improve neural network acoustic models (2013)
He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: surpassing human-level performance on ImageNet classification (2015)
Clevert, D.A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (ELUs) (2015)
Krizhevsky, A.: One weird trick for parallelizing convolutional neural networks (2014)
Keskar, N.S., Mudigere, D., Nocedal, J., Smelyanskiy, M., Tang, P.T.P.: On large-batch training for deep learning: generalization gap and sharp minima (2016)
Hoffer, E., Hubara, I., Soudry, D.: Train longer, generalize better: closing the generalization gap in large batch training of neural networks (2017)
Balles, L., Romero, J., Hennig, P.: Coupling adaptive batch sizes with learning rates (2016)
McCandlish, S., Kaplan, J., Amodei, D., OpenAI Dota Team: An empirical model of large-batch training (2018)
Smith, S.L., Le, Q.V.: A Bayesian perspective on generalization and stochastic gradient descent (2017)
Goyal, P., et al.: Accurate, large minibatch SGD: training ImageNet in 1 hour (2017)
LeCun, Y., Cortes, C., Burges, C.J.: MNIST handwritten digit database. ATT Labs, 2 (2010). http://yann.lecun.com/exdb/mnist
Li, H., Xu, Z., Taylor, G., Studer, C., Goldstein, T.: Visualizing the loss landscape of neural nets (2017)
Santurkar, S., Tsipras, D., Ilyas, A., Madry, A.: How does batch normalization help optimization? (2018)
Xiao, H., Rasul, K., Vollgraf, R.: Fashion-MNIST: a novel image dataset for benchmarking machine learning algorithms (2017)
Krizhevsky, A., Hinton, G.: Learning multiple layers of features from tiny images (2009)
Springenberg, J.T., Dosovitskiy, A., Brox, T., Riedmiller, M.: Striving for simplicity: the all convolutional net (2014)
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)
Smith, S.L., Kindermans, P.J., Ying, C., Le, Q.V.: Don’t decay the learning rate, increase the batch size (2017)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Zheng, S., Wang, L., Gupta, G. (2021). Efficient Ensemble Sparse Convolutional Neural Networks with Dynamic Batch Size. In: Singh, S.K., Roy, P., Raman, B., Nagabhushan, P. (eds) Computer Vision and Image Processing. CVIP 2020. Communications in Computer and Information Science, vol 1378. Springer, Singapore. https://doi.org/10.1007/978-981-16-1103-2_23
Download citation
DOI: https://doi.org/10.1007/978-981-16-1103-2_23
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-16-1102-5
Online ISBN: 978-981-16-1103-2
eBook Packages: Computer ScienceComputer Science (R0)