Quantitative evaluation of deep learning frameworks in heterogeneous computing environment

Lu, Zhengxian; Du, Chengkun; Jiang, Yanfeng; Xie, Xueshuo; Li, Tao; Yang, Fei

doi:10.1007/s42514-023-00168-6

Quantitative evaluation of deep learning frameworks in heterogeneous computing environment

Regular Paper
Published: 08 September 2023

Volume 6, pages 94–111, (2024)
Cite this article

CCF Transactions on High Performance Computing Aims and scope Submit manuscript

Zhengxian Lu¹,
Chengkun Du¹,
Yanfeng Jiang¹,
Xueshuo Xie ORCID: orcid.org/0000-0002-8245-8415^2,3,
Tao Li^1,2 &
…
Fei Yang⁴

153 Accesses
1 Citation
Explore all metrics

Abstract

Deep learning frameworks are powerful tools to support model training. They dispatch operators by mapping them into a series of kernel functions and launching these kernel functions to specialized devices such as GPUs. However, there is little known about the performance of dispatching and mapping mechanisms in different frameworks, although these mechanisms directly affect training time. This paper presents a performance evaluation in various frameworks by examining their kernel function efficiency and operator dispatching mechanisms. We introduce two evaluation metrics, device computing time (DCT) and device occupancy ratio (DOR), based on the device’s active and idle states. To ensure comparable evaluation results, we propose a three-step verification method including hyper-parameter, model, and updating method equivalences. Due to inequivalent implementations in frameworks, we present an equivalence adjustment method based on the number of operators. Our evaluation results demonstrate the device utilization capability of five frameworks, namely PyTorch, TensorFlow 1, TensorFlow 2, MXNet, and PaddlePaddle, and reveal the potential for further optimizing the training performance of deep learning frameworks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Opencl-pytorch: an OpenCL-based extension of PyTorch

Article 08 April 2024

How Deep Learning Model Architecture and Software Stack Impacts Training Performance in the Cloud

DistDL: A Distributed Deep Learning Service Schema with GPU Accelerating

Data availability

The data underlying this article are available in the article. The article includes citations for the datasets that were used. The code used in the experiments is open source on GitHub and can be accessed through https://github.com/LuZhengx/DLFrameBench.

Notes

References

Abadi, M., Barham, P., Chen, J., et al.: TensorFlow: a system for large-scale machine learning. In: 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16), pp. 265–283. USENIX Association, Savannah, GA (2016)
Adolf, R., Rama, S., Reagen, B., et al.: Fathom: reference workloads for modern deep learning methods. In: 2016 IEEE International Symposium on Workload Characterization (IISWC), pp. 1–10 (2016). https://doi.org/10.1109/IISWC.2016.7581275
Al-Rfou, R., Alain, G., Almahairi, A., et al.: Theano: A Python Framework for Fast Computation of Mathematical Expressions. arXiv e-prints pp arXiv-1605 (2016)
Chen, T., Li, M., Li, Y., et al.: Mxnet: A Flexible and eFficient Machine Learning Library for Heterogeneous Distributed Systems. arXiv preprint arXiv:1512.01274 (2015)
Coleman, C., Kang, D., Narayanan, D., et al.: Analysis of dawnbench, a time-to-accuracy machine learning performance benchmark. ACM SIGOPS Oper. Syst. Rev. 53(1), 14–25 (2019)
Article Google Scholar
Devlin, J., Chang, M.W., Lee, K., et al.: Bert: Pre-training of deep bidirectional transformers for language understanding. In: Proceedings of NAACL-HLT. Association for Computational Linguistics, pp. 4171–4186 (2019)
Elshawi, R., Wahab, A., Barnawi, A., et al.: Dlbench: a comprehensive experimental evaluation of deep learning frameworks. Clust. Comput. 24, 2017–2038 (2021)
Article Google Scholar
Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, JMLR Workshop and Conference Proceedings, pp. 249–256 (2010)
Goyal, P., Dollár, P., Girshick, R., et al.: Accurate, Large Minibatch sgd: Training Imagenet in 1 hour. arXiv preprint arXiv:1706.02677 (2017)
Guo, Q., Chen, S., Xie, X., et al.: An empirical study towards characterizing deep learning development and deployment across different frameworks and platforms. In: 2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE), pp. 810–822. IEEE (2019)
Han, J., Deng, S., Lo, D., et al.: An empirical study of the dependency networks of deep learning libraries. In: 2020 IEEE International Conference on Software Maintenance and Evolution (ICSME), pp. 868–878. IEEE (2020a)
Han, J., Shihab, E., Wan, Z., et al.: What do programmers discuss about deep learning frameworks. Empir. Softw. Eng. 25, 2694–2747 (2020b)
Article Google Scholar
He, K., Zhang, X., Ren, S., et al.: Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1026–1034 (2015)
He, K., Zhang, X., Ren, S., et al.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Jäger, S., Zorn, H.P., Igel, S., et al.: Parallelized training of deep nn: comparison of current concepts and frameworks. In: Proceedings of the Second Workshop on Distributed Infrastructures for Deep Learning, pp. 15–20 (2018)
Jia, Y., Shelhamer, E., Donahue, J., et al.: Caffe: Convolutional architecture for fast feature embedding. In: Proceedings of the 22nd Acm International Conference on Multimedia, pp. 675–678 (2014)
Kim, H., Nam, H., Jung, W., et al.: Performance analysis of cnn frameworks for gpus. In: 2017 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), pp. 55–64. IEEE (2017)
Krizhevsky, A., Hinton, G., et al.: Learning Multiple Layers of Features From Tiny Images (2009)
Li, M., Andersen, D.G., Park, J.W., et al.: Scaling distributed machine learning with the parameter server. In: 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI 14), pp. 583–598 (2014)
Liu, L., Wu, Y., Wei, W., et al.: Benchmarking deep learning frameworks: Design considerations, metrics and beyond. In: 2018 IEEE 38th International Conference on Distributed Computing Systems (ICDCS), pp. 1258–1269. IEEE (2018)
Ma, Y., Yu, D., Wu, T., et al.: Paddlepaddle: an open-source deep learning platform from industrial practice. Front. Data Comput. 1(1), 105–115 (2019)
Google Scholar
Mahmoud, N., Essam, Y., Elshawi, R., et al.: Dlbench: an experimental evaluation of deep learning frameworks. In: 2019 IEEE International Congress on Big Data (BigDataCongress), pp. 149–156. IEEE (2019)
Makkouk, T., Kim, D.J., Chen, T.H.P.: An empirical study on performance bugs in deep learning frameworks. In: 2022 IEEE International Conference on Software Maintenance and Evolution (ICSME), pp. 35–46. IEEE (2022)
Mattson, P., Cheng, C., Diamos, G., et al.: Mlperf training benchmark. Proc. Mach. Learn. Syst. 2, 336–349 (2020a)
Google Scholar
Mattson, P., Reddi, V.J., Cheng, C., et al.: Mlperf: an industry standard benchmark suite for machine learning performance. IEEE Micro 40(2), 8–16 (2020b)
Article Google Scholar
Paszke, A., Gross, S., Massa, F., et al.: Pytorch: an imperative style, high-performance deep learning library. Adv. Neural Inf. Process. Syst. 32, 8026 (2019)
Google Scholar
Rajpurkar, P., Zhang, J., Lopyrev, K., et al.: SQuAD: 100,000+ questions for machine comprehension of text. In: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Austin, TX, pp. 2383–2392 (2016). https://doi.org/10.18653/v1/D16-1264
Reddi, V.J., Cheng, C., Kanter, D., et al.: Mlperf inference benchmark. In: 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA), pp. 446–459. IEEE (2020)
Reddi, V.J., Cheng, C., Kanter, D., et al.: The vision behind mlperf: understanding AI inference performance. IEEE Micro 41(3), 10–18 (2021)
Article Google Scholar
Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Learning representations by back-propagating errors. Nature 323(6088), 533–536 (1986)
Article ADS Google Scholar
Seide, F., Agarwal, A.: Cntk: microsoft’s open-source deep-learning toolkit. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 2135–2135 (2016)
Shams, S., Platania, R., Lee, K., et al.: Evaluation of deep learning frameworks over different HPC architectures. In: 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS), pp. 1389–1396. IEEE (2017)
Shi, S., Wang, Q., Xu, P., et al.: Benchmarking state-of-the-art deep learning software tools. In: 2016 7th International Conference on Cloud Computing and Big Data (CCBD), pp. 99–104. IEEE (2016)
Shi, S., Wang, Q., Chu, X.: Performance modeling and evaluation of distributed deep learning frameworks on Gpus. In: 2018 IEEE 16th Intl Conf on Dependable, Autonomic and Secure Computing, 16th Intl Conf on Pervasive Intelligence and Computing, 4th Intl Conf on Big Data Intelligence and Computing and Cyber Science and Technology Congress (DASC/PiCom/DataCom/CyberSciTech), pp. 949–957. IEEE (2018)
Smith, S., Elsen, E., De, S.: On the generalization benefit of noise in stochastic gradient descent. In: International Conference on Machine Learning, PMLR, pp. 9058–9067 (2020)
Sun, S., Cao, Z., Zhu, H., et al.: A survey of optimization methods from a machine learning perspective. IEEE Trans. Cybern. 50(8), 3668–3681 (2019)
Article PubMed Google Scholar
Sun, X., Zhou, T., Wang, R., et al.: Experience report: investigating bug fixes in machine learning frameworks/libraries. Front. Comp. Sci. 15, 1–16 (2021)
Google Scholar
Tang, F., Gao, W., Zhan, J., et al.: Aibench training: Balanced industry-standard ai training benchmarking. In: 2021 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), pp. 24–35. IEEE (2021)
Trindade, R.G., Lima, J.V.F., Charão, A.S.: Performance evaluation of deep learning frameworks over different architectures. In: High Performance Computing for Computational Science—VECPAR 2018, pp. 92–104. Springer International Publishing, Cham (2019)
Wu, Y., Cao, W., Sahin, S., et al.: Experimental characterizations and analysis of deep learning frameworks. In: 2018 IEEE International Conference on Big Data (Big Data), pp. 372–377. IEEE (2018)
Xie, X., He, W., Zhu, Y., et al.: Performance evaluation and analysis of deep learning frameworks. In: Proceedings of the 2022 5th International Conference on Artificial Intelligence and Pattern Recognition. Association for Computing Machinery, New York, NY, USA, AIPR ’22, pp. 38–44. https://doi.org/10.1145/3573942.3573948(2023)
Yang, C.T., Liu, J.C., Chan, Y.W., et al.: Performance benchmarking of deep learning framework on intel xeon phi. J. Supercomput. 77, 2486–2510 (2021)
Article Google Scholar
Yuan, J., Li, X., Cheng, C., et al.: Oneflow: Redesign the Distributed Deep Learning Framework from Scratch (2021). arXiv preprint arXiv:2110.15032
Zhu, H., Akrout, M., Zheng, B., et al.: Benchmarking and analyzing deep neural network training. In: 2018 IEEE International Symposium on Workload Characterization (IISWC), pp. 88–100. IEEE (2018)

Download references

Acknowledgements

This work is partially supported by the National Natural Science Foundation (62272248), the National Key Research and Development Program of China (2018YFB2100300), the Open Project Fund of State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences (CARCHA202108), the Natural Science Foundation of Tianjin of China (21JCZDJC00740, 21JCYBJC00760), the Key Research Project of Zhejiang Lab (2022PG0AC02), and the Key R &D Program of Zhejiang (2022C04006).

Author information

Authors and Affiliations

College of Computer, Nankai University, Tianjin, 300350, China
Zhengxian Lu, Chengkun Du, Yanfeng Jiang & Tao Li
Haihe Lab of ITAI Street, Tianjin, 300350, China
Xueshuo Xie & Tao Li
State Key Laboratory of Computer Architecture, Institute of Computing Technology, Beijing, 100190, China
Xueshuo Xie
Zhejiang Lab, Zhejiang, 310058, China
Fei Yang

Authors

Zhengxian Lu
View author publications
You can also search for this author in PubMed Google Scholar
Chengkun Du
View author publications
You can also search for this author in PubMed Google Scholar
Yanfeng Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Xueshuo Xie
View author publications
You can also search for this author in PubMed Google Scholar
Tao Li
View author publications
You can also search for this author in PubMed Google Scholar
Fei Yang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Xueshuo Xie or Tao Li.

Ethics declarations

Conflict of interest

All authors certify that they have no affiliations with or involvement in any organization or entity with any financial interest or non-financial interest in the subject matter or materials discussed in this manuscript.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Lu, Z., Du, C., Jiang, Y. et al. Quantitative evaluation of deep learning frameworks in heterogeneous computing environment. CCF Trans. HPC 6, 94–111 (2024). https://doi.org/10.1007/s42514-023-00168-6

Download citation

Received: 08 July 2023
Accepted: 28 August 2023
Published: 08 September 2023
Issue Date: February 2024
DOI: https://doi.org/10.1007/s42514-023-00168-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Quantitative evaluation of deep learning frameworks in heterogeneous computing environment

Abstract

Access this article

Similar content being viewed by others

Opencl-pytorch: an OpenCL-based extension of PyTorch

How Deep Learning Model Architecture and Software Stack Impacts Training Performance in the Cloud

DistDL: A Distributed Deep Learning Service Schema with GPU Accelerating

Data availability

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Ethics declarations

Conflict of interest

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Quantitative evaluation of deep learning frameworks in heterogeneous computing environment

Abstract

Access this article

Similar content being viewed by others

Opencl-pytorch: an OpenCL-based extension of PyTorch

How Deep Learning Model Architecture and Software Stack Impacts Training Performance in the Cloud

DistDL: A Distributed Deep Learning Service Schema with GPU Accelerating

Data availability

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Ethics declarations

Conflict of interest

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation