Efficient Ensemble Sparse Convolutional Neural Networks with Dynamic Batch Size

Zheng, Shen; Wang, Liwei; Gupta, Gaurav

doi:10.1007/978-981-16-1103-2_23

Shen Zheng⁹,
Liwei Wang⁹ &
Gaurav Gupta⁹

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1378))

Included in the following conference series:

International Conference on Computer Vision and Image Processing

1128 Accesses

Abstract

In this paper, an efficient ensemble sparse Convolutional Neural Networks (CNNs) with dynamic batch size is proposed. We addressed two issues at the heart of deep learning—speed and accuracy. Firstly, we presented ensemble CNNs with weighted average stacking which significantly increases the testing accuracy. Secondly, we combine network pruning and Winograd-ReLU convolution to accelerate computational speed. Motivated by electron movement in electrical fields, we finally propose a novel, dynamic batch size algorithm. We repeatedly increase the learning rate and the momentum coefficient until validation accuracy falls, while scaling the batch size. With no data augmentation and little hyperparameter tuning, our method speeds up models on FASHION-MINST, CIFAR-10, and CIFAR-100 to 1.55x, 2.86x, and 4.15x with a testing accuracy improvement of 2.66%, 1.37%, and 4.48%, respectively. We also visually demonstrate that our approach retains the most distinct image classification features during exhaustive pruning.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Lecun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)
Article Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. Commun. ACM 60(6), 84–90 (2017)
Article Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition (2014)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition (2015)
Google Scholar
Mathieu, M., Henaff, M., LeCun, Y.: Fast training of convolutional networks through FFTs (2013)
Google Scholar
Lavin, A., Gray, S.: Fast algorithms for convolutional neural networks (2015)
Google Scholar
Winograd, S.: Arithmetic Complexity of Computations. Society for Industrial and Applied Mathematics (1980)
Google Scholar
Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift (2015)
Google Scholar
Lu, L., Liang, Y.: SpWA: an efficient sparse winograd convolutional neural networks accelerator on FPGAs. In: 2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC), pp. 1–6 (2018)
Google Scholar
Liu, X.: Pruning of winograd and FFT based convolution algorithm (2016)
Google Scholar
Li, S., Park, J., Tang, P.T.P.: Enabling sparse winograd convolution by native pruning (2017)
Google Scholar
Liu, X., Pool, J., Han, S., Dally, W.J.: Efficient sparse-winograd convolutional neural networks (2018)
Google Scholar
Nair, V., Hinton, G.E.: Rectified linear units improve restricted Boltzmann machines. In: Proceedings of the 27th International Conference on International Conference on Machine Learning, ICML 2010, Madison, WI, USA, pp. 807–814. Omnipress (2010)
Google Scholar
Maas, A.L.: Rectifier nonlinearities improve neural network acoustic models (2013)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: surpassing human-level performance on ImageNet classification (2015)
Google Scholar
Clevert, D.A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (ELUs) (2015)
Google Scholar
Krizhevsky, A.: One weird trick for parallelizing convolutional neural networks (2014)
Google Scholar
Keskar, N.S., Mudigere, D., Nocedal, J., Smelyanskiy, M., Tang, P.T.P.: On large-batch training for deep learning: generalization gap and sharp minima (2016)
Google Scholar
Hoffer, E., Hubara, I., Soudry, D.: Train longer, generalize better: closing the generalization gap in large batch training of neural networks (2017)
Google Scholar
Balles, L., Romero, J., Hennig, P.: Coupling adaptive batch sizes with learning rates (2016)
Google Scholar
McCandlish, S., Kaplan, J., Amodei, D., OpenAI Dota Team: An empirical model of large-batch training (2018)
Google Scholar
Smith, S.L., Le, Q.V.: A Bayesian perspective on generalization and stochastic gradient descent (2017)
Google Scholar
Goyal, P., et al.: Accurate, large minibatch SGD: training ImageNet in 1 hour (2017)
Google Scholar
LeCun, Y., Cortes, C., Burges, C.J.: MNIST handwritten digit database. ATT Labs, 2 (2010). http://yann.lecun.com/exdb/mnist
Li, H., Xu, Z., Taylor, G., Studer, C., Goldstein, T.: Visualizing the loss landscape of neural nets (2017)
Google Scholar
Santurkar, S., Tsipras, D., Ilyas, A., Madry, A.: How does batch normalization help optimization? (2018)
Google Scholar
Xiao, H., Rasul, K., Vollgraf, R.: Fashion-MNIST: a novel image dataset for benchmarking machine learning algorithms (2017)
Google Scholar
Krizhevsky, A., Hinton, G.: Learning multiple layers of features from tiny images (2009)
Google Scholar
Springenberg, J.T., Dosovitskiy, A., Brox, T., Riedmiller, M.: Striving for simplicity: the all convolutional net (2014)
Google Scholar
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)
MathSciNet MATH Google Scholar
Smith, S.L., Kindermans, P.J., Ying, C., Le, Q.V.: Don’t decay the learning rate, increase the batch size (2017)
Google Scholar

Download references

Author information

Authors and Affiliations

College of Science and Technology, Wenzhou-Kean University, Wenzhou, China
Shen Zheng, Liwei Wang & Gaurav Gupta

Authors

Shen Zheng
View author publications
You can also search for this author in PubMed Google Scholar
Liwei Wang
View author publications
You can also search for this author in PubMed Google Scholar
Gaurav Gupta
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Gaurav Gupta .

Editor information

Editors and Affiliations

Indian Institute of Information Technology Allahabad, Prayagraj, India
Satish Kumar Singh
Indian Institute of Technology Roorkee, Roorkee, India
Partha Roy
Indian Institute of Technology Roorkee, Roorkee, India
Balasubramanian Raman
Indian Institute of Information Technology Allahabad, Prayagraj, India
P. Nagabhushan

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zheng, S., Wang, L., Gupta, G. (2021). Efficient Ensemble Sparse Convolutional Neural Networks with Dynamic Batch Size. In: Singh, S.K., Roy, P., Raman, B., Nagabhushan, P. (eds) Computer Vision and Image Processing. CVIP 2020. Communications in Computer and Information Science, vol 1378. Springer, Singapore. https://doi.org/10.1007/978-981-16-1103-2_23

Download citation

DOI: https://doi.org/10.1007/978-981-16-1103-2_23
Published: 26 March 2021
Publisher Name: Springer, Singapore
Print ISBN: 978-981-16-1102-5
Online ISBN: 978-981-16-1103-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics