Abstract
To date, some different deep learning accelerators (DLAs) have proposed to solve challenges caused by increasing deep neural networks’ layers. GPU-based systems almost faced energy efficiency problems due to the parallel computing operations and increasing memory accesses that led to memory capacity, bandwidth requirement, and delay challenges. DLAs-based systems tried to overcome the challenges and improve the parameters, which their flexibility remains a challenge. Some case studies investigated the proposed DLAs and demonstrated an impressive effect of different mapping methods on reducing energy consumption and delay. We analyze MAERI’s role in solving the issue and the impact of mapping methods to face the challenges induced by implementing different DNN trained models using the accelerators. This work proposes an algorithm for mapping and assigning virtual neurons (VNs) on the MAERI accelerator to improve its performance and cost. The simulation results demonstrate the reducing energy consumption and delay of approximately 21–92% and 14–21% caused by implementing AlexNet and VGG-16 on MAERI, respectively. The mapping method has a significant effect in increasing the proposed DLAs’ performance and reducing their cost without redesign their structures. The proposed VNs assignment approach helps to support different DNN trained models and increases DLA-based systems' flexibility.
Similar content being viewed by others
References
Andri R, Cavigelli L, Rossi D, Benini L (2018) Hyperdrive: a systolically scalable binary-weight CNN inference engine for mW IoT end-nodes. In: 2018 IEEE Computer Society Annual Symposium on VLSI (ISVLSI). IEEE, pp 509–515
Sze V, Chen YH, Yang TJ, Emer JS (2017) Efficient processing of deep neural networks: a tutorial and survey. Proc IEEE 105(12):2295–2329
Qin E, Samajdar A, Kwon H, Nadella V, Srinivasan S, Das D, Kaul B, Krishna T (2020) Sigma: a sparse and irregular gemm accelerator with flexible interconnects for dnn training. In: 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, pp 58–70
Ascia G, Catania V, Jose J, Monteleone S, Palesi M, Patti D (2020) Improving inference latency and energy of network-on-chip based convolutional neural networks through weights compression. In: 2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW). IEEE, pp 54–63
Kwon H, Samajdar A, Krishna T (2018) Maeri: enabling flexible dataflow mapping over dnn accelerators via reconfigurable interconnects. ACM SIGPLAN Notices 53(2):461–475
Mirmahaleh SY, Reshadi M, Bagherzadeh N (2020) Flow mapping on mesh-based deep learning accelerator. J Parallel and Distrib Comput 1(144):80–97
Chen KC, Ebrahimi M, Wang TY, Yang YC. NoC-based DNN accelerator (2019) A future design paradigm. In: Proceedings of the 13th IEEE/ACM International Symposium on Networks-on-Chip. pp 1–8
Chen KC, Wang TY, Yang YC (2019) Cycle-accurate noc-based convolutional neural network simulator. In: Proceedings of the International Conference on Omni-Layer Intelligent Systems. pp 199–204
Chen KC, Ebrahimi M, Wang TY, Yang YC, Liao YH (2020) A NoC-based simulator for design and evaluation of deep neural networks. Microprocess Microsyst 3:103145
Samajdar A, Zhu Y, Whatmough P, Mattina M, Krishna T (2018) Scale-sim: systolic cnn accelerator simulator. arXiv preprint arXiv:1811.02883
Lahdhiri H, Palesi M, Monteleone S, Patti D, Ascia G, Lorandel J, Bourdel E, Catania V (2020) DNNZip: selective layers compression technique in deep neural network accelerators. In: 2020 23rd Euromicro Conference on Digital System Design (DSD). pp 526–533. https://doi.org/10.1109/DSD51259.2020.00088
Kwon H, Pellauer M, Krishna T (2018) MAESTRO: an open-source infrastructure for modeling dataflows within deep learning accelerators. arXiv preprint arXiv:1805.02566v1
Zhao Z, Kwon H, Kuhar S, Sheng W, Mao Z, Krishna T (2019) mRNA: enabling efficient mapping space exploration for a reconfiguration neural accelerator. In: 2019 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS). IEEE, pp 282–292
Chen YH, Krishna T, Emer JS, Sze V (2016) Eyeriss: an energy-efficient reconfigurable accelerator for deep convolutional neural networks. IEEE J Solid-State Circuits 52(1):127–138
Chen YH, Emer J, Sze V (2018) Eyeriss v2: a flexible and high-performance accelerator for emerging deep neural networks. arXiv preprint arxiv:1807.07928
Kwon H, Samajdar A, Krishna T (2017) Rethinking nocs for spatial neural network accelerators. In: 2017 Eleventh IEEE/ACM International Symposium on Networks-on-Chip (NOCS). IEEE
Du Z, Fasthuber R, Chen T, Ienne P, Li L, Luo T, Feng X, Chen Y, Temam O (2015) ShiDianNao: Shifting vision processing closer to the sensor. In: Proceedings of the 42nd Annual International Symposium on Computer Architecture. pp 92–104
Jouppi NP, Young C, Patil N, Patterson D, Agrawal G, Bajwa R, Bates S, Bhatia S, Boden N, Borchers A, Boyle R (2017) In-datacenter performance analysis of a tensor processing unit. In: Proceedings of the 44th Annual International Symposium on Computer Architecture. pp 1–12
Abadi M, Agarwal A, Barham P, Brevdo E, Chen Z, Citro C, Corrado GS, Davis A, Dean J, Devin M, Ghemawat S (2016) Tensorflow: large-scale machine learning on heterogeneous distributed systems. arXiv preprint arXiv:1603.04467
Kwon H, Chatarasi P, Pellauer M, Parashar A, Sarkar V, Krishna T (2019) Understanding reuse, performance, and hardware cost of dnn dataflow: a data-centric approach. In: Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture. pp 754–768
Lu W, Yan G, Li J, Gong S, Han Y, Li X (2017) Flexflow: A flexible dataflow accelerator architecture for convolutional neural networks. In: 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, pp 553–564
Nowatzki T, Sartin-Tarm M, De Carli L, Sankaralingam K, Estan C, Robatmili B (2013) A general constraint-centric scheduling framework for spatial architectures. ACM SIGPLAN Notices 48(6):495–506
Nowatzki T, Gangadhar V, Ardalani N, Sankaralingam K (2017) Stream-dataflow acceleration. In: 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA). IEEE, pp 416–429
Tang T, Xie Y (2018) Mlpat: a power area timing modeling framework for machine learning accelerators. In: Proc. DOSSA Workshop. pp 1–3
Gao M, Pu J, Yang X, Horowitz M, Kozyrakis C (2017) Tetris: scalable and efficient neural network acceleration with 3d memory. In: Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems. pp 751–764
Firuzan A, Modarressi M, Daneshtalab M, Reshadi M (2018) Reconfigurable network-on-chip for 3D neural network accelerators. In: 2018 Twelfth IEEE/ACM International Symposium on Networks-on-Chip (NOCS). IEEE, pp 1–8
Wang C, Gong L, Yu Q, Li X, Xie Y, Zhou X (2016) DLAU: a scalable deep learning accelerator unit on FPGA. IEEE Trans Comput Aided Des Integr Circuits Syst 36(3):513–517
Mirmahaleh SY, Reshadi M, Shabani H, Guo X, Bagherzadeh N (2019) Flow mapping and data distribution on mesh-based deep learning accelerator. In: Proceedings of the 13th IEEE/ACM International Symposium on Networks-on-Chip. pp 1–8
Catania V, Mineo A, Monteleone S, Palesi M, Patti D (2016) Cycle-accurate network on chip simulation with noxim. ACM Trans Model Comput Simula (TOMACS) 27(1):1–25
Chen T, Du Z, Sun N, Wang J, Wu C, Chen Y, Temam O (2014) Diannao: a small-footprint high-throughput accelerator for ubiquitous machine-learning. ACM SIGARCH Comput Archit News 42(1):269–284
Chen Y, Luo T, Liu S, Zhang S, He L, Wang J, Li L, Chen T, Xu Z, Sun N, Temam O (2014) Dadiannao: a machine-learning supercomputer. In: 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture. IEEE, pp 609–622
Liu D, Chen T, Liu S, Zhou J, Zhou S, Teman O, Feng X, Zhou X, Chen Y (2015) Pudiannao: a polyvalent machine learning accelerator. ACM SIGARCH Comput Archit News 43(1):369–381
Chatarasi P, Kwon H, Raina N, Malik S, Haridas V, Parashar A, Pellauer M, Krishna T, Sarkar V (2020) Marvel: a data-centric compiler for DNN operators on spatial accelerators. arXiv preprint arXiv:2002.07752
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Reshadi, M., Mirmahaleh, S.Y.H. Mapping and virtual neuron assignment algorithms for MAERI accelerator. J Supercomput 78, 238–257 (2022). https://doi.org/10.1007/s11227-021-03893-3
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-021-03893-3