Batch Size Influence on Performance of Graphic and Tensor Processing Units During Training and Inference Phases

Kochura, Yuriy; Gordienko, Yuri; Taran, Vlad; Gordienko, Nikita; Rokovyi, Alexandr; Alienin, Oleg; Stirenko, Sergii

doi:10.1007/978-3-030-16621-2_61

Batch Size Influence on Performance of Graphic and Tensor Processing Units During Training and Inference Phases

Yuriy Kochura¹⁸,
Yuri Gordienko¹⁸,
Vlad Taran¹⁸,
Nikita Gordienko¹⁸,
Alexandr Rokovyi¹⁸,
Oleg Alienin¹⁸ &
…
Sergii Stirenko¹⁸

Conference paper
First Online: 29 March 2019

1036 Accesses
15 Citations
3 Altmetric

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 938))

Abstract

The impact of the maximally possible batch size (for the better runtime) on performance of graphic processing units (GPU) and tensor processing units (TPU) during training and inference phases is investigated. The numerous runs of the selected deep neural network (DNN) were performed on the standard MNIST and Fashion-MNIST datasets. The significant speedup was obtained even for extremely low-scale usage of Google TPUv2 units (8 cores only) in comparison to the quite powerful GPU NVIDIA Tesla K80 card with the speedup up to 10x for training stage (without taking into account the overheads) and speedup up to 2x for prediction stage (with and without taking into account overheads). The precise speedup values depend on the utilization level of TPUv2 units and increase with the increase of the data volume under processing, but for the datasets used in this work (MNIST and Fashion-MNIST with images of sizes 28 × 28) the speedup was observed for batch sizes >512 images for training phase and >40 000 images for prediction phase. It should be noted that these results were obtained without detriment to the prediction accuracy and loss that were equal for both GPU and TPU runs up to the 3rd significant digit for MNIST dataset, and up to the 2nd significant digit for Fashion-MNIST dataset.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Lacey, G., Taylor, G.W., Areibi, S.: Deep Learning on FPGAs: Past, Present, and Future. arXiv preprint arXiv:1602.04283 (2016)
Jouppi, N.P., et al.: In-datacenter performance analysis of a tensor processing unit. In: International Symposium on Computer Architecture, vol. 45, no. 2, pp. 1–12 (2017)
Google Scholar
Haußmann, E.: Comparing Google’s TPUv2 against Nvidia’s V100 on ResNet-50. RiseML Blog, 26 April 2018. Accessed 29 Aug 2018
Google Scholar
Masters, D., Luschi, C.: Revisiting small batch training for deep neural networks. arXiv preprint arXiv:1804.07612 (2018)
Devarakonda, A., Naumov, M., Garland, M.: AdaBatch: adaptive batch sizes for training deep neural networks. arXiv preprint arXiv:1712.02029 (2017)
Smith, L.N.: A disciplined approach to neural network hyper-parameters: Part 1 - learning rate, batch size, momentum, and weight decay. arXiv preprint arXiv:1803.09820 (2018)
Kochura, Y., Stirenko, S., Alienin, O., Novotarskiy, M., Gordienko, Y.: Performance analysis of open source machine learning frameworks for various parameters in single-threaded and multi-threaded modes. In: Advances in Intelligent Systems and Computing II. CSIT 2017. Advances in Intelligent Systems and Computing, vol. 689, pp. 243–256. Springer, Cham, September 2017
Google Scholar
Kochura, Y., Stirenko, S., Gordienko, Y.: Comparative performance analysis of neural networks architectures on H2O platform for various activation functions. In: Young Scientists Forum on Applied Physics and Engineering (YSF), 2017 IEEE International, pp. 70–73. IEEE (2017)
Google Scholar
Kochura, Y., Stirenko, S., Alienin, O., Novotarskiy, M., Gordienko, Y.: Comparative analysis of open source frameworks for machine learning with use case in single-threaded and multi-threaded modes. In: 2017 12th International Scientific and Technical Conference on Computer Sciences and Information Technologies (CSIT), vol. 1, pp. 373–376. IEEE (2017)
Google Scholar
Zhu, H., Zheng, B., Schroeder, B., Pekhimenko, G., Phanishayee, A.: DNN-Train: benchmarking and analyzing DNN training. Training 8, 16GBs (2018)
Google Scholar
Jäger, S., Zorn, H.P., Igel, S., Zirpins, C.: Parallelized training of deep NN: comparison of current concepts and frameworks. In: Proceedings of the Second Workshop on Distributed Infrastructures for Deep Learning, pp. 15–20. ACM (2018)
Google Scholar
Jouppi, N., Young, C., Patil, N., Patterson, D.: Motivation for and evaluation of the first tensor processing unit. IEEE Micro 38(3), 10–19 (2018)
Article Google Scholar
LeCun, Y., Cortes, C., Burges, C.J.: MNIST handwritten digit database. AT&T Labs (2018). http://yann.lecun.com/exdb/mnist. Accessed 30 Aug 2018
Xiao, H., Rasul, K., Vollgraf, R.: Fashion-MNIST: a novel image dataset for benchmarking machine learning algorithms. arXiv preprint arXiv:1708.07747 (2017)
Gordienko, Yu., Kochura, Yu., Gordienko, N., Taran, V., Alienin, O., Rokovyi, O., Stirenko, S.: Specialized Tensor Processing Architectures for Deep Learning Models (2018, submitted)
Google Scholar
Cheng, J., Wang, P.S., Li, G., Hu, Q.H., Lu, H.Q.: Recent advances in efficient computation of deep convolutional neural networks. Front. Inf. Technol. Electron. Eng. 19(1), 64–77 (2018)
Article Google Scholar
Zhu, M., Gupta, S.: To prune, or not to prune: exploring the efficacy of pruning for model compression. arXiv preprint arXiv:1710.01878 (2017)
Gordienko, Yu., Kochura, Yu., Taran, V., Gordienko, N.: Adaptive Iterative Channel Pruning for Accelerating Deep Neural Networks (2018, submitted)
Google Scholar
Li, Y., Liu, Z., Xu, K., Yu, H., Ren, F.: A GPU-outperforming FPGA accelerator architecture for binary convolutional neural networks. ACM J. Emerg. Technol. Comput. Syst. 14(2), 1–16 (2018)
Article Google Scholar
Singh, Y., Kaur, L.: Obstacle detection techniques in outdoor environment: process, study and analysis. Int. J. Image Graph. Signal Process. (IJIGSP) 9(5), 35–53 (2017). https://doi.org/10.5815/ijigsp.2017.05.05
Katru, A., Kumar, A.: Improved parallel lane detection using modified additive hough transform. Int. J. Image Graph. Signal Process. (IJIGSP) 8(11), 10–17 (2016). https://doi.org/10.5815/ijigsp.2016.11.02
Ata, M.M., El-Darieby, M., Elnaby, M.A., Napoleon, S.A.: Traffic video enhancement based vehicle correct tracked methodology. Int. J. Image Graph Signal Process. (IJIGSP) 9(12), 30–40 (2017). https://doi.org/10.5815/ijigsp.2017.12.04
Isong, B., Khutsoane, O., Dladlu, N.: Real-time monitoring and detection of drink-driving and vehicle over-speeding. Int. J. Image Graph. Signal Process. (IJIGSP), 9(11), 1–9 (2017). https://doi.org/10.5815/ijigsp.2017.11.01

Download references

Author information

Authors and Affiliations

National Technical University of Ukraine “Igor Sikorsky Kyiv Polytechnic Institute”, Kyiv, Ukraine
Yuriy Kochura, Yuri Gordienko, Vlad Taran, Nikita Gordienko, Alexandr Rokovyi, Oleg Alienin & Sergii Stirenko

Authors

Yuriy Kochura
View author publications
You can also search for this author in PubMed Google Scholar
Yuri Gordienko
View author publications
You can also search for this author in PubMed Google Scholar
Vlad Taran
View author publications
You can also search for this author in PubMed Google Scholar
Nikita Gordienko
View author publications
You can also search for this author in PubMed Google Scholar
Alexandr Rokovyi
View author publications
You can also search for this author in PubMed Google Scholar
Oleg Alienin
View author publications
You can also search for this author in PubMed Google Scholar
Sergii Stirenko
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Yuriy Kochura or Yuri Gordienko .

Editor information

Editors and Affiliations

School of Educational Information Technology, Central China Normal University, Wuhan, Hubei, China
Zhengbing Hu
Mechanical Engineering Research Institute of the Russian Academy of Sciences, Moscow, Russia
Sergey Petoukhov
“Igor Sikorsky Kiev Polytechnic Institute”, National Technical University of Ukraine, Kiev, Ukraine
Ivan Dychka
Halmos College of Natural Sciences and Oceanography, Nova Southeastern University, Ft. Lauderdale, FL, USA
Matthew He

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kochura, Y. et al. (2020). Batch Size Influence on Performance of Graphic and Tensor Processing Units During Training and Inference Phases. In: Hu, Z., Petoukhov, S., Dychka, I., He, M. (eds) Advances in Computer Science for Engineering and Education II. ICCSEEA 2019. Advances in Intelligent Systems and Computing, vol 938. Springer, Cham. https://doi.org/10.1007/978-3-030-16621-2_61

Download citation

DOI: https://doi.org/10.1007/978-3-030-16621-2_61
Published: 29 March 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-16620-5
Online ISBN: 978-3-030-16621-2
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics