Skip to main content
Log in

Quantitative evaluation of deep learning frameworks in heterogeneous computing environment

  • Regular Paper
  • Published:
CCF Transactions on High Performance Computing Aims and scope Submit manuscript

Abstract

Deep learning frameworks are powerful tools to support model training. They dispatch operators by mapping them into a series of kernel functions and launching these kernel functions to specialized devices such as GPUs. However, there is little known about the performance of dispatching and mapping mechanisms in different frameworks, although these mechanisms directly affect training time. This paper presents a performance evaluation in various frameworks by examining their kernel function efficiency and operator dispatching mechanisms. We introduce two evaluation metrics, device computing time (DCT) and device occupancy ratio (DOR), based on the device’s active and idle states. To ensure comparable evaluation results, we propose a three-step verification method including hyper-parameter, model, and updating method equivalences. Due to inequivalent implementations in frameworks, we present an equivalence adjustment method based on the number of operators. Our evaluation results demonstrate the device utilization capability of five frameworks, namely PyTorch, TensorFlow 1, TensorFlow 2, MXNet, and PaddlePaddle, and reveal the potential for further optimizing the training performance of deep learning frameworks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Data availability

The data underlying this article are available in the article. The article includes citations for the datasets that were used. The code used in the experiments is open source on GitHub and can be accessed through https://github.com/LuZhengx/DLFrameBench.

Notes

  1. https://onnx.ai/.

  2. https://www.docker.com/.

  3. https://docs.nvidia.com/nvtx/.

  4. https://developer.nvidia.com/nsight-systems.

  5. https://github.com/NVIDIA/cutlass.

  6. https://eigen.tuxfamily.org.

References

  • Abadi, M., Barham, P., Chen, J., et al.: TensorFlow: a system for large-scale machine learning. In: 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16), pp. 265–283. USENIX Association, Savannah, GA (2016)

  • Adolf, R., Rama, S., Reagen, B., et al.: Fathom: reference workloads for modern deep learning methods. In: 2016 IEEE International Symposium on Workload Characterization (IISWC), pp. 1–10 (2016). https://doi.org/10.1109/IISWC.2016.7581275

  • Al-Rfou, R., Alain, G., Almahairi, A., et al.: Theano: A Python Framework for Fast Computation of Mathematical Expressions. arXiv e-prints pp arXiv-1605 (2016)

  • Chen, T., Li, M., Li, Y., et al.: Mxnet: A Flexible and eFficient Machine Learning Library for Heterogeneous Distributed Systems. arXiv preprint arXiv:1512.01274 (2015)

  • Coleman, C., Kang, D., Narayanan, D., et al.: Analysis of dawnbench, a time-to-accuracy machine learning performance benchmark. ACM SIGOPS Oper. Syst. Rev. 53(1), 14–25 (2019)

    Article  Google Scholar 

  • Devlin, J., Chang, M.W., Lee, K., et al.: Bert: Pre-training of deep bidirectional transformers for language understanding. In: Proceedings of NAACL-HLT. Association for Computational Linguistics, pp. 4171–4186 (2019)

  • Elshawi, R., Wahab, A., Barnawi, A., et al.: Dlbench: a comprehensive experimental evaluation of deep learning frameworks. Clust. Comput. 24, 2017–2038 (2021)

    Article  Google Scholar 

  • Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, JMLR Workshop and Conference Proceedings, pp. 249–256 (2010)

  • Goyal, P., Dollár, P., Girshick, R., et al.: Accurate, Large Minibatch sgd: Training Imagenet in 1 hour. arXiv preprint arXiv:1706.02677 (2017)

  • Guo, Q., Chen, S., Xie, X., et al.: An empirical study towards characterizing deep learning development and deployment across different frameworks and platforms. In: 2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE), pp. 810–822. IEEE (2019)

  • Han, J., Deng, S., Lo, D., et al.: An empirical study of the dependency networks of deep learning libraries. In: 2020 IEEE International Conference on Software Maintenance and Evolution (ICSME), pp. 868–878. IEEE (2020a)

  • Han, J., Shihab, E., Wan, Z., et al.: What do programmers discuss about deep learning frameworks. Empir. Softw. Eng. 25, 2694–2747 (2020b)

    Article  Google Scholar 

  • He, K., Zhang, X., Ren, S., et al.: Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1026–1034 (2015)

  • He, K., Zhang, X., Ren, S., et al.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)

  • Jäger, S., Zorn, H.P., Igel, S., et al.: Parallelized training of deep nn: comparison of current concepts and frameworks. In: Proceedings of the Second Workshop on Distributed Infrastructures for Deep Learning, pp. 15–20 (2018)

  • Jia, Y., Shelhamer, E., Donahue, J., et al.: Caffe: Convolutional architecture for fast feature embedding. In: Proceedings of the 22nd Acm International Conference on Multimedia, pp. 675–678 (2014)

  • Kim, H., Nam, H., Jung, W., et al.: Performance analysis of cnn frameworks for gpus. In: 2017 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), pp. 55–64. IEEE (2017)

  • Krizhevsky, A., Hinton, G., et al.: Learning Multiple Layers of Features From Tiny Images (2009)

  • Li, M., Andersen, D.G., Park, J.W., et al.: Scaling distributed machine learning with the parameter server. In: 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI 14), pp. 583–598 (2014)

  • Liu, L., Wu, Y., Wei, W., et al.: Benchmarking deep learning frameworks: Design considerations, metrics and beyond. In: 2018 IEEE 38th International Conference on Distributed Computing Systems (ICDCS), pp. 1258–1269. IEEE (2018)

  • Ma, Y., Yu, D., Wu, T., et al.: Paddlepaddle: an open-source deep learning platform from industrial practice. Front. Data Comput. 1(1), 105–115 (2019)

    Google Scholar 

  • Mahmoud, N., Essam, Y., Elshawi, R., et al.: Dlbench: an experimental evaluation of deep learning frameworks. In: 2019 IEEE International Congress on Big Data (BigDataCongress), pp. 149–156. IEEE (2019)

  • Makkouk, T., Kim, D.J., Chen, T.H.P.: An empirical study on performance bugs in deep learning frameworks. In: 2022 IEEE International Conference on Software Maintenance and Evolution (ICSME), pp. 35–46. IEEE (2022)

  • Mattson, P., Cheng, C., Diamos, G., et al.: Mlperf training benchmark. Proc. Mach. Learn. Syst. 2, 336–349 (2020a)

    Google Scholar 

  • Mattson, P., Reddi, V.J., Cheng, C., et al.: Mlperf: an industry standard benchmark suite for machine learning performance. IEEE Micro 40(2), 8–16 (2020b)

    Article  Google Scholar 

  • Paszke, A., Gross, S., Massa, F., et al.: Pytorch: an imperative style, high-performance deep learning library. Adv. Neural Inf. Process. Syst. 32, 8026 (2019)

    Google Scholar 

  • Rajpurkar, P., Zhang, J., Lopyrev, K., et al.: SQuAD: 100,000+ questions for machine comprehension of text. In: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Austin, TX, pp. 2383–2392 (2016). https://doi.org/10.18653/v1/D16-1264

  • Reddi, V.J., Cheng, C., Kanter, D., et al.: Mlperf inference benchmark. In: 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA), pp. 446–459. IEEE (2020)

  • Reddi, V.J., Cheng, C., Kanter, D., et al.: The vision behind mlperf: understanding AI inference performance. IEEE Micro 41(3), 10–18 (2021)

    Article  Google Scholar 

  • Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Learning representations by back-propagating errors. Nature 323(6088), 533–536 (1986)

    Article  ADS  Google Scholar 

  • Seide, F., Agarwal, A.: Cntk: microsoft’s open-source deep-learning toolkit. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 2135–2135 (2016)

  • Shams, S., Platania, R., Lee, K., et al.: Evaluation of deep learning frameworks over different HPC architectures. In: 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS), pp. 1389–1396. IEEE (2017)

  • Shi, S., Wang, Q., Xu, P., et al.: Benchmarking state-of-the-art deep learning software tools. In: 2016 7th International Conference on Cloud Computing and Big Data (CCBD), pp. 99–104. IEEE (2016)

  • Shi, S., Wang, Q., Chu, X.: Performance modeling and evaluation of distributed deep learning frameworks on Gpus. In: 2018 IEEE 16th Intl Conf on Dependable, Autonomic and Secure Computing, 16th Intl Conf on Pervasive Intelligence and Computing, 4th Intl Conf on Big Data Intelligence and Computing and Cyber Science and Technology Congress (DASC/PiCom/DataCom/CyberSciTech), pp. 949–957. IEEE (2018)

  • Smith, S., Elsen, E., De, S.: On the generalization benefit of noise in stochastic gradient descent. In: International Conference on Machine Learning, PMLR, pp. 9058–9067 (2020)

  • Sun, S., Cao, Z., Zhu, H., et al.: A survey of optimization methods from a machine learning perspective. IEEE Trans. Cybern. 50(8), 3668–3681 (2019)

    Article  PubMed  Google Scholar 

  • Sun, X., Zhou, T., Wang, R., et al.: Experience report: investigating bug fixes in machine learning frameworks/libraries. Front. Comp. Sci. 15, 1–16 (2021)

    Google Scholar 

  • Tang, F., Gao, W., Zhan, J., et al.: Aibench training: Balanced industry-standard ai training benchmarking. In: 2021 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), pp. 24–35. IEEE (2021)

  • Trindade, R.G., Lima, J.V.F., Charão, A.S.: Performance evaluation of deep learning frameworks over different architectures. In: High Performance Computing for Computational Science—VECPAR 2018, pp. 92–104. Springer International Publishing, Cham (2019)

  • Wu, Y., Cao, W., Sahin, S., et al.: Experimental characterizations and analysis of deep learning frameworks. In: 2018 IEEE International Conference on Big Data (Big Data), pp. 372–377. IEEE (2018)

  • Xie, X., He, W., Zhu, Y., et al.: Performance evaluation and analysis of deep learning frameworks. In: Proceedings of the 2022 5th International Conference on Artificial Intelligence and Pattern Recognition. Association for Computing Machinery, New York, NY, USA, AIPR ’22, pp. 38–44. https://doi.org/10.1145/3573942.3573948(2023)

  • Yang, C.T., Liu, J.C., Chan, Y.W., et al.: Performance benchmarking of deep learning framework on intel xeon phi. J. Supercomput. 77, 2486–2510 (2021)

    Article  Google Scholar 

  • Yuan, J., Li, X., Cheng, C., et al.: Oneflow: Redesign the Distributed Deep Learning Framework from Scratch (2021). arXiv preprint arXiv:2110.15032

  • Zhu, H., Akrout, M., Zheng, B., et al.: Benchmarking and analyzing deep neural network training. In: 2018 IEEE International Symposium on Workload Characterization (IISWC), pp. 88–100. IEEE (2018)

Download references

Acknowledgements

This work is partially supported by the National Natural Science Foundation (62272248), the National Key Research and Development Program of China (2018YFB2100300), the Open Project Fund of State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences (CARCHA202108), the Natural Science Foundation of Tianjin of China (21JCZDJC00740, 21JCYBJC00760), the Key Research Project of Zhejiang Lab (2022PG0AC02), and the Key R &D Program of Zhejiang (2022C04006).

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Xueshuo Xie or Tao Li.

Ethics declarations

Conflict of interest

All authors certify that they have no affiliations with or involvement in any organization or entity with any financial interest or non-financial interest in the subject matter or materials discussed in this manuscript.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lu, Z., Du, C., Jiang, Y. et al. Quantitative evaluation of deep learning frameworks in heterogeneous computing environment. CCF Trans. HPC 6, 94–111 (2024). https://doi.org/10.1007/s42514-023-00168-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s42514-023-00168-6

Keywords

Navigation