Abstract
The impact of the maximally possible batch size (for the better runtime) on performance of graphic processing units (GPU) and tensor processing units (TPU) during training and inference phases is investigated. The numerous runs of the selected deep neural network (DNN) were performed on the standard MNIST and Fashion-MNIST datasets. The significant speedup was obtained even for extremely low-scale usage of Google TPUv2 units (8 cores only) in comparison to the quite powerful GPU NVIDIA Tesla K80 card with the speedup up to 10x for training stage (without taking into account the overheads) and speedup up to 2x for prediction stage (with and without taking into account overheads). The precise speedup values depend on the utilization level of TPUv2 units and increase with the increase of the data volume under processing, but for the datasets used in this work (MNIST and Fashion-MNIST with images of sizes 28 × 28) the speedup was observed for batch sizes >512 images for training phase and >40 000 images for prediction phase. It should be noted that these results were obtained without detriment to the prediction accuracy and loss that were equal for both GPU and TPU runs up to the 3rd significant digit for MNIST dataset, and up to the 2nd significant digit for Fashion-MNIST dataset.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Lacey, G., Taylor, G.W., Areibi, S.: Deep Learning on FPGAs: Past, Present, and Future. arXiv preprint arXiv:1602.04283 (2016)
Jouppi, N.P., et al.: In-datacenter performance analysis of a tensor processing unit. In: International Symposium on Computer Architecture, vol. 45, no. 2, pp. 1–12 (2017)
Haußmann, E.: Comparing Google’s TPUv2 against Nvidia’s V100 on ResNet-50. RiseML Blog, 26 April 2018. Accessed 29 Aug 2018
Masters, D., Luschi, C.: Revisiting small batch training for deep neural networks. arXiv preprint arXiv:1804.07612 (2018)
Devarakonda, A., Naumov, M., Garland, M.: AdaBatch: adaptive batch sizes for training deep neural networks. arXiv preprint arXiv:1712.02029 (2017)
Smith, L.N.: A disciplined approach to neural network hyper-parameters: Part 1 - learning rate, batch size, momentum, and weight decay. arXiv preprint arXiv:1803.09820 (2018)
Kochura, Y., Stirenko, S., Alienin, O., Novotarskiy, M., Gordienko, Y.: Performance analysis of open source machine learning frameworks for various parameters in single-threaded and multi-threaded modes. In: Advances in Intelligent Systems and Computing II. CSIT 2017. Advances in Intelligent Systems and Computing, vol. 689, pp. 243–256. Springer, Cham, September 2017
Kochura, Y., Stirenko, S., Gordienko, Y.: Comparative performance analysis of neural networks architectures on H2O platform for various activation functions. In: Young Scientists Forum on Applied Physics and Engineering (YSF), 2017 IEEE International, pp. 70–73. IEEE (2017)
Kochura, Y., Stirenko, S., Alienin, O., Novotarskiy, M., Gordienko, Y.: Comparative analysis of open source frameworks for machine learning with use case in single-threaded and multi-threaded modes. In: 2017 12th International Scientific and Technical Conference on Computer Sciences and Information Technologies (CSIT), vol. 1, pp. 373–376. IEEE (2017)
Zhu, H., Zheng, B., Schroeder, B., Pekhimenko, G., Phanishayee, A.: DNN-Train: benchmarking and analyzing DNN training. Training 8, 16GBs (2018)
Jäger, S., Zorn, H.P., Igel, S., Zirpins, C.: Parallelized training of deep NN: comparison of current concepts and frameworks. In: Proceedings of the Second Workshop on Distributed Infrastructures for Deep Learning, pp. 15–20. ACM (2018)
Jouppi, N., Young, C., Patil, N., Patterson, D.: Motivation for and evaluation of the first tensor processing unit. IEEE Micro 38(3), 10–19 (2018)
LeCun, Y., Cortes, C., Burges, C.J.: MNIST handwritten digit database. AT&T Labs (2018). http://yann.lecun.com/exdb/mnist. Accessed 30 Aug 2018
Xiao, H., Rasul, K., Vollgraf, R.: Fashion-MNIST: a novel image dataset for benchmarking machine learning algorithms. arXiv preprint arXiv:1708.07747 (2017)
Gordienko, Yu., Kochura, Yu., Gordienko, N., Taran, V., Alienin, O., Rokovyi, O., Stirenko, S.: Specialized Tensor Processing Architectures for Deep Learning Models (2018, submitted)
Cheng, J., Wang, P.S., Li, G., Hu, Q.H., Lu, H.Q.: Recent advances in efficient computation of deep convolutional neural networks. Front. Inf. Technol. Electron. Eng. 19(1), 64–77 (2018)
Zhu, M., Gupta, S.: To prune, or not to prune: exploring the efficacy of pruning for model compression. arXiv preprint arXiv:1710.01878 (2017)
Gordienko, Yu., Kochura, Yu., Taran, V., Gordienko, N.: Adaptive Iterative Channel Pruning for Accelerating Deep Neural Networks (2018, submitted)
Li, Y., Liu, Z., Xu, K., Yu, H., Ren, F.: A GPU-outperforming FPGA accelerator architecture for binary convolutional neural networks. ACM J. Emerg. Technol. Comput. Syst. 14(2), 1–16 (2018)
Singh, Y., Kaur, L.: Obstacle detection techniques in outdoor environment: process, study and analysis. Int. J. Image Graph. Signal Process. (IJIGSP) 9(5), 35–53 (2017). https://doi.org/10.5815/ijigsp.2017.05.05
Katru, A., Kumar, A.: Improved parallel lane detection using modified additive hough transform. Int. J. Image Graph. Signal Process. (IJIGSP) 8(11), 10–17 (2016). https://doi.org/10.5815/ijigsp.2016.11.02
Ata, M.M., El-Darieby, M., Elnaby, M.A., Napoleon, S.A.: Traffic video enhancement based vehicle correct tracked methodology. Int. J. Image Graph Signal Process. (IJIGSP) 9(12), 30–40 (2017). https://doi.org/10.5815/ijigsp.2017.12.04
Isong, B., Khutsoane, O., Dladlu, N.: Real-time monitoring and detection of drink-driving and vehicle over-speeding. Int. J. Image Graph. Signal Process. (IJIGSP), 9(11), 1–9 (2017). https://doi.org/10.5815/ijigsp.2017.11.01
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Kochura, Y. et al. (2020). Batch Size Influence on Performance of Graphic and Tensor Processing Units During Training and Inference Phases. In: Hu, Z., Petoukhov, S., Dychka, I., He, M. (eds) Advances in Computer Science for Engineering and Education II. ICCSEEA 2019. Advances in Intelligent Systems and Computing, vol 938. Springer, Cham. https://doi.org/10.1007/978-3-030-16621-2_61
Download citation
DOI: https://doi.org/10.1007/978-3-030-16621-2_61
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-16620-5
Online ISBN: 978-3-030-16621-2
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)