Skip to main content

Accelerating Large-Scale Deep Convolutional Neural Networks on Multi-core Vector Accelerators

  • Conference paper
  • First Online:
Network and Parallel Computing (NPC 2020)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12639))

Included in the following conference series:

Abstract

This paper proposes an efficient algorithm mapping method for accelerating deep convolutional neural networks, which includes: (1) Proposing an efficient transformation method, which converts CNN’s convolutional layer and fully connected layer computations into efficient large-scale matrix multiplication computations, and converts pooling layer computations into efficient matrix row computations; (2) Designing a set of general and efficient vectorization method for convolutional layer, fully connected layer and pooling layer on the vector accelerator. The experimental results on the accelerator show that the average computing efficiency of convolution layer and full connected layer of AlexNet, VGG-19, GoogleNet and ResNet-50 are 93.3% and 93.4% respectively, and the average data access efficiency of pooling layer is 70%.

This work is supported by the National Natural Science Foundation of China (No. 61572025).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 99.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 129.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Aurora ESP Projects. https://www.alcf.anl.gov/science/projects/AuroraESP/all. Accessed 24 Aug 2020

  2. Patton, R.M., et al.: Exascale deep learning to accelerate cancer research (2019)

    Google Scholar 

  3. Kurth, T., et al.: Exascale deep learning for climate analytics. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis, pp. 1–12, November 2018

    Google Scholar 

  4. Ma, S., et al.: Coordinated DMA: improving the DRAM access efficiency for matrix multiplication. IEEE Trans. Parallel Distrib. Syst. 30(10), 2148–2164 (2019)

    Article  Google Scholar 

  5. Liu, Z., Tian, X., Ma, S.: The implementation and optimization of parallel linpack on multi-core vector accelerator. In: 2019 IEEE 21st International Conference on High Performance Computing and Communications, pp. 2261–2269. IEEE (2019)

    Google Scholar 

  6. Maji, P., Mullins, R.: 1D-FALCON: accelerating deep convolutional neural network inference by co-optimization of models and underlying arithmetic implementation. In: Lintas, A., Rovetta, S., Verschure, P.F.M.J., Villa, A.E.P. (eds.) ICANN 2017. LNCS, vol. 10614, pp. 21–29. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-68612-7_3

    Chapter  Google Scholar 

  7. Lin, S., Ji, R., Li, Y., Wu, Y., Huang, F., Zhang, B.: Accelerating convolutional networks via global & dynamic filter pruning. In: Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, pp. 2425–2432 (2018)

    Google Scholar 

  8. Abtahi, T., Shea, C., Kulkarni, A., Mohsenin, T.: Accelerating convolutional neural network with FFT on embedded hardware. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 26(9), 1737–1749 (2018)

    Article  Google Scholar 

  9. Scherer, D., Schulz, H., Behnke, S.: Accelerating large-scale convolutional neural networks with parallel graphics multiprocessors. In: Diamantaras, K., Duch, W., Iliadis, L.S. (eds.) ICANN 2010. LNCS, vol. 6354, pp. 82–91. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15825-4_9

    Chapter  Google Scholar 

  10. Lee, S., Jha, D., Agrawal, A., Choudhary, A., Liao, W.: Parallel deep convolutional neural network training by exploiting the overlapping of computation and communication. In: 2017 IEEE 24th International Conference on High Performance Computing (2017)

    Google Scholar 

  11. Jouppi, N.P., et al.: In-data center performance analysis of a tensor processing unit. In: Proceedings of the IEEE/ACM International Symposium on Computer Architecture (ISCA), pp. 1–12 (2017)

    Google Scholar 

  12. Imani, M., Peroni, D., Kim, Y., Rahimi, A., Rosing, T.: Efficient neural network acceleration on GPGPU using content addressable memory. In: Proceedings of the IEEE/ACM Proceedings Design, Automation and Test in Eurpoe (DATE), pp. 1026–1031 (2017)

    Google Scholar 

  13. Zhang, C., Li, P., Sun, G., Guan, Y., Xiao, B., Cong, J.: Optimizing FPGA-based accelerator design for deep convolutional neural networks. In: Proceedings of the ACM Symposium on FPGAs, pp. 161–170 (2015)

    Google Scholar 

  14. Rahman, A., Lee, J., Choi, K.: Efficient FPGA acceleration of convolutional neural networks using logical-3D compute array. In: Proceedings of the IEEE/ACM Proceedings Design, Automation and Test in Eurpoe (DATE), pp. 1393–1398 (2016)

    Google Scholar 

  15. Intel Neural Network Processor. https://www.intel.ai/intel-nervana-neural-network-processor-architecture-update. Accessed 24 Aug 2020

  16. Wang, Y., Li, H., Li, X.: Re-architecting the on-chip memory sub-system of machine-learning accelerator for embedded devices. In: Proceedings of the IEEE/ACM International Conference on Computer-Aided Design (ICCAD), p. 13 (2016)

    Google Scholar 

  17. Hu, M., et al.: Dot-product engine for neuromorphic computing: programming 1T1M crossbar to accelerate matrix-vector multiplication. In: Proceedings of the ACM/IEEE Design Automation Conference (DAC), p. 19 (2016)

    Google Scholar 

  18. Liu, Z., Tian, X., Chen, X., Lei, Y., Liao, M.: Efficient large-scale 1D FFT vectorization on multi-core vector accelerator. In: 2019 IEEE 21st International Conference on High Performance Computing and Communications, pp. 484–491. IEEE (2019)

    Google Scholar 

  19. Zhang, J.Y., Guo, Y., Hu, X.: Design and implementation of deep neural network for edge computing. IEICE Trans. InfSyst. 101, 1982–1996 (2018)

    Article  Google Scholar 

  20. Yang, C., Chen, S., Wang, Y., Zhang, J.: The evaluation of DCNN on vector-SIMD DSP. IEEE Access 7, 22301–22309 (2019)

    Article  Google Scholar 

  21. INFERENCE using the NVIDIA T4. https://www.dell.com/support/article/zh-cn/sln316556/inference-using-the-nvidia-t4?lang=en. Accessed 24 Aug 2020

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Zhong Liu or Sheng Ma .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 IFIP International Federation for Information Processing

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Liu, Z., Ma, S., Li, C., Chen, H. (2021). Accelerating Large-Scale Deep Convolutional Neural Networks on Multi-core Vector Accelerators. In: He, X., Shao, E., Tan, G. (eds) Network and Parallel Computing. NPC 2020. Lecture Notes in Computer Science(), vol 12639. Springer, Cham. https://doi.org/10.1007/978-3-030-79478-1_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-79478-1_6

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-79477-4

  • Online ISBN: 978-3-030-79478-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics