Skip to main content

Batch Size Influence on Performance of Graphic and Tensor Processing Units During Training and Inference Phases

  • Conference paper
  • First Online:

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 938))

Abstract

The impact of the maximally possible batch size (for the better runtime) on performance of graphic processing units (GPU) and tensor processing units (TPU) during training and inference phases is investigated. The numerous runs of the selected deep neural network (DNN) were performed on the standard MNIST and Fashion-MNIST datasets. The significant speedup was obtained even for extremely low-scale usage of Google TPUv2 units (8 cores only) in comparison to the quite powerful GPU NVIDIA Tesla K80 card with the speedup up to 10x for training stage (without taking into account the overheads) and speedup up to 2x for prediction stage (with and without taking into account overheads). The precise speedup values depend on the utilization level of TPUv2 units and increase with the increase of the data volume under processing, but for the datasets used in this work (MNIST and Fashion-MNIST with images of sizes 28 × 28) the speedup was observed for batch sizes >512 images for training phase and >40 000 images for prediction phase. It should be noted that these results were obtained without detriment to the prediction accuracy and loss that were equal for both GPU and TPU runs up to the 3rd significant digit for MNIST dataset, and up to the 2nd significant digit for Fashion-MNIST dataset.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Lacey, G., Taylor, G.W., Areibi, S.: Deep Learning on FPGAs: Past, Present, and Future. arXiv preprint arXiv:1602.04283 (2016)

  2. Jouppi, N.P., et al.: In-datacenter performance analysis of a tensor processing unit. In: International Symposium on Computer Architecture, vol. 45, no. 2, pp. 1–12 (2017)

    Google Scholar 

  3. Haußmann, E.: Comparing Google’s TPUv2 against Nvidia’s V100 on ResNet-50. RiseML Blog, 26 April 2018. Accessed 29 Aug 2018

    Google Scholar 

  4. Masters, D., Luschi, C.: Revisiting small batch training for deep neural networks. arXiv preprint arXiv:1804.07612 (2018)

  5. Devarakonda, A., Naumov, M., Garland, M.: AdaBatch: adaptive batch sizes for training deep neural networks. arXiv preprint arXiv:1712.02029 (2017)

  6. Smith, L.N.: A disciplined approach to neural network hyper-parameters: Part 1 - learning rate, batch size, momentum, and weight decay. arXiv preprint arXiv:1803.09820 (2018)

  7. Kochura, Y., Stirenko, S., Alienin, O., Novotarskiy, M., Gordienko, Y.: Performance analysis of open source machine learning frameworks for various parameters in single-threaded and multi-threaded modes. In: Advances in Intelligent Systems and Computing II. CSIT 2017. Advances in Intelligent Systems and Computing, vol. 689, pp. 243–256. Springer, Cham, September 2017

    Google Scholar 

  8. Kochura, Y., Stirenko, S., Gordienko, Y.: Comparative performance analysis of neural networks architectures on H2O platform for various activation functions. In: Young Scientists Forum on Applied Physics and Engineering (YSF), 2017 IEEE International, pp. 70–73. IEEE (2017)

    Google Scholar 

  9. Kochura, Y., Stirenko, S., Alienin, O., Novotarskiy, M., Gordienko, Y.: Comparative analysis of open source frameworks for machine learning with use case in single-threaded and multi-threaded modes. In: 2017 12th International Scientific and Technical Conference on Computer Sciences and Information Technologies (CSIT), vol. 1, pp. 373–376. IEEE (2017)

    Google Scholar 

  10. Zhu, H., Zheng, B., Schroeder, B., Pekhimenko, G., Phanishayee, A.: DNN-Train: benchmarking and analyzing DNN training. Training 8, 16GBs (2018)

    Google Scholar 

  11. Jäger, S., Zorn, H.P., Igel, S., Zirpins, C.: Parallelized training of deep NN: comparison of current concepts and frameworks. In: Proceedings of the Second Workshop on Distributed Infrastructures for Deep Learning, pp. 15–20. ACM (2018)

    Google Scholar 

  12. Jouppi, N., Young, C., Patil, N., Patterson, D.: Motivation for and evaluation of the first tensor processing unit. IEEE Micro 38(3), 10–19 (2018)

    Article  Google Scholar 

  13. LeCun, Y., Cortes, C., Burges, C.J.: MNIST handwritten digit database. AT&T Labs (2018). http://yann.lecun.com/exdb/mnist. Accessed 30 Aug 2018

  14. Xiao, H., Rasul, K., Vollgraf, R.: Fashion-MNIST: a novel image dataset for benchmarking machine learning algorithms. arXiv preprint arXiv:1708.07747 (2017)

  15. Gordienko, Yu., Kochura, Yu., Gordienko, N., Taran, V., Alienin, O., Rokovyi, O., Stirenko, S.: Specialized Tensor Processing Architectures for Deep Learning Models (2018, submitted)

    Google Scholar 

  16. Cheng, J., Wang, P.S., Li, G., Hu, Q.H., Lu, H.Q.: Recent advances in efficient computation of deep convolutional neural networks. Front. Inf. Technol. Electron. Eng. 19(1), 64–77 (2018)

    Article  Google Scholar 

  17. Zhu, M., Gupta, S.: To prune, or not to prune: exploring the efficacy of pruning for model compression. arXiv preprint arXiv:1710.01878 (2017)

  18. Gordienko, Yu., Kochura, Yu., Taran, V., Gordienko, N.: Adaptive Iterative Channel Pruning for Accelerating Deep Neural Networks (2018, submitted)

    Google Scholar 

  19. Li, Y., Liu, Z., Xu, K., Yu, H., Ren, F.: A GPU-outperforming FPGA accelerator architecture for binary convolutional neural networks. ACM J. Emerg. Technol. Comput. Syst. 14(2), 1–16 (2018)

    Article  Google Scholar 

  20. Singh, Y., Kaur, L.: Obstacle detection techniques in outdoor environment: process, study and analysis. Int. J. Image Graph. Signal Process. (IJIGSP) 9(5), 35–53 (2017). https://doi.org/10.5815/ijigsp.2017.05.05

  21. Katru, A., Kumar, A.: Improved parallel lane detection using modified additive hough transform. Int. J. Image Graph. Signal Process. (IJIGSP) 8(11), 10–17 (2016). https://doi.org/10.5815/ijigsp.2016.11.02

  22. Ata, M.M., El-Darieby, M., Elnaby, M.A., Napoleon, S.A.: Traffic video enhancement based vehicle correct tracked methodology. Int. J. Image Graph Signal Process. (IJIGSP) 9(12), 30–40 (2017). https://doi.org/10.5815/ijigsp.2017.12.04

  23. Isong, B., Khutsoane, O., Dladlu, N.: Real-time monitoring and detection of drink-driving and vehicle over-speeding. Int. J. Image Graph. Signal Process. (IJIGSP), 9(11), 1–9 (2017). https://doi.org/10.5815/ijigsp.2017.11.01

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Yuriy Kochura or Yuri Gordienko .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Kochura, Y. et al. (2020). Batch Size Influence on Performance of Graphic and Tensor Processing Units During Training and Inference Phases. In: Hu, Z., Petoukhov, S., Dychka, I., He, M. (eds) Advances in Computer Science for Engineering and Education II. ICCSEEA 2019. Advances in Intelligent Systems and Computing, vol 938. Springer, Cham. https://doi.org/10.1007/978-3-030-16621-2_61

Download citation

Publish with us

Policies and ethics