Skip to main content

Advertisement

Log in

Mapping and virtual neuron assignment algorithms for MAERI accelerator

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

To date, some different deep learning accelerators (DLAs) have proposed to solve challenges caused by increasing deep neural networks’ layers. GPU-based systems almost faced energy efficiency problems due to the parallel computing operations and increasing memory accesses that led to memory capacity, bandwidth requirement, and delay challenges. DLAs-based systems tried to overcome the challenges and improve the parameters, which their flexibility remains a challenge. Some case studies investigated the proposed DLAs and demonstrated an impressive effect of different mapping methods on reducing energy consumption and delay. We analyze MAERI’s role in solving the issue and the impact of mapping methods to face the challenges induced by implementing different DNN trained models using the accelerators. This work proposes an algorithm for mapping and assigning virtual neurons (VNs) on the MAERI accelerator to improve its performance and cost. The simulation results demonstrate the reducing energy consumption and delay of approximately 21–92% and 14–21% caused by implementing AlexNet and VGG-16 on MAERI, respectively. The mapping method has a significant effect in increasing the proposed DLAs’ performance and reducing their cost without redesign their structures. The proposed VNs assignment approach helps to support different DNN trained models and increases DLA-based systems' flexibility.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  1. Andri R, Cavigelli L, Rossi D, Benini L (2018) Hyperdrive: a systolically scalable binary-weight CNN inference engine for mW IoT end-nodes. In: 2018 IEEE Computer Society Annual Symposium on VLSI (ISVLSI). IEEE, pp 509–515

  2. Sze V, Chen YH, Yang TJ, Emer JS (2017) Efficient processing of deep neural networks: a tutorial and survey. Proc IEEE 105(12):2295–2329

    Article  Google Scholar 

  3. Qin E, Samajdar A, Kwon H, Nadella V, Srinivasan S, Das D, Kaul B, Krishna T (2020) Sigma: a sparse and irregular gemm accelerator with flexible interconnects for dnn training. In: 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, pp 58–70

  4. Ascia G, Catania V, Jose J, Monteleone S, Palesi M, Patti D (2020) Improving inference latency and energy of network-on-chip based convolutional neural networks through weights compression. In: 2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW). IEEE, pp 54–63

  5. Kwon H, Samajdar A, Krishna T (2018) Maeri: enabling flexible dataflow mapping over dnn accelerators via reconfigurable interconnects. ACM SIGPLAN Notices 53(2):461–475

    Article  Google Scholar 

  6. Mirmahaleh SY, Reshadi M, Bagherzadeh N (2020) Flow mapping on mesh-based deep learning accelerator. J Parallel and Distrib Comput 1(144):80–97

    Article  Google Scholar 

  7. Chen KC, Ebrahimi M, Wang TY, Yang YC. NoC-based DNN accelerator (2019) A future design paradigm. In: Proceedings of the 13th IEEE/ACM International Symposium on Networks-on-Chip. pp 1–8

  8. Chen KC, Wang TY, Yang YC (2019) Cycle-accurate noc-based convolutional neural network simulator. In: Proceedings of the International Conference on Omni-Layer Intelligent Systems. pp 199–204

  9. Chen KC, Ebrahimi M, Wang TY, Yang YC, Liao YH (2020) A NoC-based simulator for design and evaluation of deep neural networks. Microprocess Microsyst 3:103145

    Article  Google Scholar 

  10. Samajdar A, Zhu Y, Whatmough P, Mattina M, Krishna T (2018) Scale-sim: systolic cnn accelerator simulator. arXiv preprint arXiv:1811.02883

  11. Lahdhiri H, Palesi M, Monteleone S, Patti D, Ascia G, Lorandel J, Bourdel E, Catania V (2020) DNNZip: selective layers compression technique in deep neural network accelerators. In: 2020 23rd Euromicro Conference on Digital System Design (DSD). pp 526–533. https://doi.org/10.1109/DSD51259.2020.00088

  12. Kwon H, Pellauer M, Krishna T (2018) MAESTRO: an open-source infrastructure for modeling dataflows within deep learning accelerators. arXiv preprint arXiv:1805.02566v1

  13. Zhao Z, Kwon H, Kuhar S, Sheng W, Mao Z, Krishna T (2019) mRNA: enabling efficient mapping space exploration for a reconfiguration neural accelerator. In: 2019 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS). IEEE, pp 282–292

  14. Chen YH, Krishna T, Emer JS, Sze V (2016) Eyeriss: an energy-efficient reconfigurable accelerator for deep convolutional neural networks. IEEE J Solid-State Circuits 52(1):127–138

    Article  Google Scholar 

  15. Chen YH, Emer J, Sze V (2018) Eyeriss v2: a flexible and high-performance accelerator for emerging deep neural networks. arXiv preprint arxiv:1807.07928

  16. Kwon H, Samajdar A, Krishna T (2017) Rethinking nocs for spatial neural network accelerators. In: 2017 Eleventh IEEE/ACM International Symposium on Networks-on-Chip (NOCS). IEEE

  17. Du Z, Fasthuber R, Chen T, Ienne P, Li L, Luo T, Feng X, Chen Y, Temam O (2015) ShiDianNao: Shifting vision processing closer to the sensor. In: Proceedings of the 42nd Annual International Symposium on Computer Architecture. pp 92–104

  18. Jouppi NP, Young C, Patil N, Patterson D, Agrawal G, Bajwa R, Bates S, Bhatia S, Boden N, Borchers A, Boyle R (2017) In-datacenter performance analysis of a tensor processing unit. In: Proceedings of the 44th Annual International Symposium on Computer Architecture. pp 1–12

  19. Abadi M, Agarwal A, Barham P, Brevdo E, Chen Z, Citro C, Corrado GS, Davis A, Dean J, Devin M, Ghemawat S (2016) Tensorflow: large-scale machine learning on heterogeneous distributed systems. arXiv preprint arXiv:1603.04467

  20. Kwon H, Chatarasi P, Pellauer M, Parashar A, Sarkar V, Krishna T (2019) Understanding reuse, performance, and hardware cost of dnn dataflow: a data-centric approach. In: Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture. pp 754–768

  21. Lu W, Yan G, Li J, Gong S, Han Y, Li X (2017) Flexflow: A flexible dataflow accelerator architecture for convolutional neural networks. In: 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, pp 553–564

  22. Nowatzki T, Sartin-Tarm M, De Carli L, Sankaralingam K, Estan C, Robatmili B (2013) A general constraint-centric scheduling framework for spatial architectures. ACM SIGPLAN Notices 48(6):495–506

    Article  Google Scholar 

  23. Nowatzki T, Gangadhar V, Ardalani N, Sankaralingam K (2017) Stream-dataflow acceleration. In: 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA). IEEE, pp 416–429

  24. Tang T, Xie Y (2018) Mlpat: a power area timing modeling framework for machine learning accelerators. In: Proc. DOSSA Workshop. pp 1–3

  25. Gao M, Pu J, Yang X, Horowitz M, Kozyrakis C (2017) Tetris: scalable and efficient neural network acceleration with 3d memory. In: Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems. pp 751–764

  26. Firuzan A, Modarressi M, Daneshtalab M, Reshadi M (2018) Reconfigurable network-on-chip for 3D neural network accelerators. In: 2018 Twelfth IEEE/ACM International Symposium on Networks-on-Chip (NOCS). IEEE, pp 1–8

  27. Wang C, Gong L, Yu Q, Li X, Xie Y, Zhou X (2016) DLAU: a scalable deep learning accelerator unit on FPGA. IEEE Trans Comput Aided Des Integr Circuits Syst 36(3):513–517

    Google Scholar 

  28. Mirmahaleh SY, Reshadi M, Shabani H, Guo X, Bagherzadeh N (2019) Flow mapping and data distribution on mesh-based deep learning accelerator. In: Proceedings of the 13th IEEE/ACM International Symposium on Networks-on-Chip. pp 1–8

  29. Catania V, Mineo A, Monteleone S, Palesi M, Patti D (2016) Cycle-accurate network on chip simulation with noxim. ACM Trans Model Comput Simula (TOMACS) 27(1):1–25

    Article  Google Scholar 

  30. Chen T, Du Z, Sun N, Wang J, Wu C, Chen Y, Temam O (2014) Diannao: a small-footprint high-throughput accelerator for ubiquitous machine-learning. ACM SIGARCH Comput Archit News 42(1):269–284

    Article  Google Scholar 

  31. Chen Y, Luo T, Liu S, Zhang S, He L, Wang J, Li L, Chen T, Xu Z, Sun N, Temam O (2014) Dadiannao: a machine-learning supercomputer. In: 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture. IEEE, pp 609–622

  32. Liu D, Chen T, Liu S, Zhou J, Zhou S, Teman O, Feng X, Zhou X, Chen Y (2015) Pudiannao: a polyvalent machine learning accelerator. ACM SIGARCH Comput Archit News 43(1):369–381

    Article  Google Scholar 

  33. Chatarasi P, Kwon H, Raina N, Malik S, Haridas V, Parashar A, Pellauer M, Krishna T, Sarkar V (2020) Marvel: a data-centric compiler for DNN operators on spatial accelerators. arXiv preprint arXiv:2002.07752

  34. https://github.com/georgia-tech-synergy-lab/mRNA

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Seyedeh Yasaman Hosseini Mirmahaleh.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Reshadi, M., Mirmahaleh, S.Y.H. Mapping and virtual neuron assignment algorithms for MAERI accelerator. J Supercomput 78, 238–257 (2022). https://doi.org/10.1007/s11227-021-03893-3

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-021-03893-3

Keywords

Navigation