Mapping and virtual neuron assignment algorithms for MAERI accelerator

Reshadi, Midia; Mirmahaleh, Seyedeh Yasaman Hosseini

doi:10.1007/s11227-021-03893-3

Mapping and virtual neuron assignment algorithms for MAERI accelerator

Published: 25 May 2021

Volume 78, pages 238–257, (2022)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

246 Accesses
Explore all metrics

Abstract

To date, some different deep learning accelerators (DLAs) have proposed to solve challenges caused by increasing deep neural networks’ layers. GPU-based systems almost faced energy efficiency problems due to the parallel computing operations and increasing memory accesses that led to memory capacity, bandwidth requirement, and delay challenges. DLAs-based systems tried to overcome the challenges and improve the parameters, which their flexibility remains a challenge. Some case studies investigated the proposed DLAs and demonstrated an impressive effect of different mapping methods on reducing energy consumption and delay. We analyze MAERI’s role in solving the issue and the impact of mapping methods to face the challenges induced by implementing different DNN trained models using the accelerators. This work proposes an algorithm for mapping and assigning virtual neurons (VNs) on the MAERI accelerator to improve its performance and cost. The simulation results demonstrate the reducing energy consumption and delay of approximately 21–92% and 14–21% caused by implementing AlexNet and VGG-16 on MAERI, respectively. The mapping method has a significant effect in increasing the proposed DLAs’ performance and reducing their cost without redesign their structures. The proposed VNs assignment approach helps to support different DNN trained models and increases DLA-based systems' flexibility.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Deep Learning: A Comprehensive Overview on Techniques, Taxonomy, Applications and Research Directions

Article 18 August 2021

Review of deep learning: concepts, CNN architectures, challenges, applications, future directions

Article Open access 31 March 2021

A Precision-Aware Neuron Engine for DNN Accelerators

Article 26 April 2024

References

Andri R, Cavigelli L, Rossi D, Benini L (2018) Hyperdrive: a systolically scalable binary-weight CNN inference engine for mW IoT end-nodes. In: 2018 IEEE Computer Society Annual Symposium on VLSI (ISVLSI). IEEE, pp 509–515
Sze V, Chen YH, Yang TJ, Emer JS (2017) Efficient processing of deep neural networks: a tutorial and survey. Proc IEEE 105(12):2295–2329
Article Google Scholar
Qin E, Samajdar A, Kwon H, Nadella V, Srinivasan S, Das D, Kaul B, Krishna T (2020) Sigma: a sparse and irregular gemm accelerator with flexible interconnects for dnn training. In: 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, pp 58–70
Ascia G, Catania V, Jose J, Monteleone S, Palesi M, Patti D (2020) Improving inference latency and energy of network-on-chip based convolutional neural networks through weights compression. In: 2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW). IEEE, pp 54–63
Kwon H, Samajdar A, Krishna T (2018) Maeri: enabling flexible dataflow mapping over dnn accelerators via reconfigurable interconnects. ACM SIGPLAN Notices 53(2):461–475
Article Google Scholar
Mirmahaleh SY, Reshadi M, Bagherzadeh N (2020) Flow mapping on mesh-based deep learning accelerator. J Parallel and Distrib Comput 1(144):80–97
Article Google Scholar
Chen KC, Ebrahimi M, Wang TY, Yang YC. NoC-based DNN accelerator (2019) A future design paradigm. In: Proceedings of the 13th IEEE/ACM International Symposium on Networks-on-Chip. pp 1–8
Chen KC, Wang TY, Yang YC (2019) Cycle-accurate noc-based convolutional neural network simulator. In: Proceedings of the International Conference on Omni-Layer Intelligent Systems. pp 199–204
Chen KC, Ebrahimi M, Wang TY, Yang YC, Liao YH (2020) A NoC-based simulator for design and evaluation of deep neural networks. Microprocess Microsyst 3:103145
Article Google Scholar
Samajdar A, Zhu Y, Whatmough P, Mattina M, Krishna T (2018) Scale-sim: systolic cnn accelerator simulator. arXiv preprint arXiv:1811.02883
Lahdhiri H, Palesi M, Monteleone S, Patti D, Ascia G, Lorandel J, Bourdel E, Catania V (2020) DNNZip: selective layers compression technique in deep neural network accelerators. In: 2020 23rd Euromicro Conference on Digital System Design (DSD). pp 526–533. https://doi.org/10.1109/DSD51259.2020.00088
Kwon H, Pellauer M, Krishna T (2018) MAESTRO: an open-source infrastructure for modeling dataflows within deep learning accelerators. arXiv preprint arXiv:1805.02566v1
Zhao Z, Kwon H, Kuhar S, Sheng W, Mao Z, Krishna T (2019) mRNA: enabling efficient mapping space exploration for a reconfiguration neural accelerator. In: 2019 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS). IEEE, pp 282–292
Chen YH, Krishna T, Emer JS, Sze V (2016) Eyeriss: an energy-efficient reconfigurable accelerator for deep convolutional neural networks. IEEE J Solid-State Circuits 52(1):127–138
Article Google Scholar
Chen YH, Emer J, Sze V (2018) Eyeriss v2: a flexible and high-performance accelerator for emerging deep neural networks. arXiv preprint arxiv:1807.07928
Kwon H, Samajdar A, Krishna T (2017) Rethinking nocs for spatial neural network accelerators. In: 2017 Eleventh IEEE/ACM International Symposium on Networks-on-Chip (NOCS). IEEE
Du Z, Fasthuber R, Chen T, Ienne P, Li L, Luo T, Feng X, Chen Y, Temam O (2015) ShiDianNao: Shifting vision processing closer to the sensor. In: Proceedings of the 42nd Annual International Symposium on Computer Architecture. pp 92–104
Jouppi NP, Young C, Patil N, Patterson D, Agrawal G, Bajwa R, Bates S, Bhatia S, Boden N, Borchers A, Boyle R (2017) In-datacenter performance analysis of a tensor processing unit. In: Proceedings of the 44th Annual International Symposium on Computer Architecture. pp 1–12
Abadi M, Agarwal A, Barham P, Brevdo E, Chen Z, Citro C, Corrado GS, Davis A, Dean J, Devin M, Ghemawat S (2016) Tensorflow: large-scale machine learning on heterogeneous distributed systems. arXiv preprint arXiv:1603.04467
Kwon H, Chatarasi P, Pellauer M, Parashar A, Sarkar V, Krishna T (2019) Understanding reuse, performance, and hardware cost of dnn dataflow: a data-centric approach. In: Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture. pp 754–768
Lu W, Yan G, Li J, Gong S, Han Y, Li X (2017) Flexflow: A flexible dataflow accelerator architecture for convolutional neural networks. In: 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, pp 553–564
Nowatzki T, Sartin-Tarm M, De Carli L, Sankaralingam K, Estan C, Robatmili B (2013) A general constraint-centric scheduling framework for spatial architectures. ACM SIGPLAN Notices 48(6):495–506
Article Google Scholar
Nowatzki T, Gangadhar V, Ardalani N, Sankaralingam K (2017) Stream-dataflow acceleration. In: 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA). IEEE, pp 416–429
Tang T, Xie Y (2018) Mlpat: a power area timing modeling framework for machine learning accelerators. In: Proc. DOSSA Workshop. pp 1–3
Gao M, Pu J, Yang X, Horowitz M, Kozyrakis C (2017) Tetris: scalable and efficient neural network acceleration with 3d memory. In: Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems. pp 751–764
Firuzan A, Modarressi M, Daneshtalab M, Reshadi M (2018) Reconfigurable network-on-chip for 3D neural network accelerators. In: 2018 Twelfth IEEE/ACM International Symposium on Networks-on-Chip (NOCS). IEEE, pp 1–8
Wang C, Gong L, Yu Q, Li X, Xie Y, Zhou X (2016) DLAU: a scalable deep learning accelerator unit on FPGA. IEEE Trans Comput Aided Des Integr Circuits Syst 36(3):513–517
Google Scholar
Mirmahaleh SY, Reshadi M, Shabani H, Guo X, Bagherzadeh N (2019) Flow mapping and data distribution on mesh-based deep learning accelerator. In: Proceedings of the 13th IEEE/ACM International Symposium on Networks-on-Chip. pp 1–8
Catania V, Mineo A, Monteleone S, Palesi M, Patti D (2016) Cycle-accurate network on chip simulation with noxim. ACM Trans Model Comput Simula (TOMACS) 27(1):1–25
Article Google Scholar
Chen T, Du Z, Sun N, Wang J, Wu C, Chen Y, Temam O (2014) Diannao: a small-footprint high-throughput accelerator for ubiquitous machine-learning. ACM SIGARCH Comput Archit News 42(1):269–284
Article Google Scholar
Chen Y, Luo T, Liu S, Zhang S, He L, Wang J, Li L, Chen T, Xu Z, Sun N, Temam O (2014) Dadiannao: a machine-learning supercomputer. In: 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture. IEEE, pp 609–622
Liu D, Chen T, Liu S, Zhou J, Zhou S, Teman O, Feng X, Zhou X, Chen Y (2015) Pudiannao: a polyvalent machine learning accelerator. ACM SIGARCH Comput Archit News 43(1):369–381
Article Google Scholar
Chatarasi P, Kwon H, Raina N, Malik S, Haridas V, Parashar A, Pellauer M, Krishna T, Sarkar V (2020) Marvel: a data-centric compiler for DNN operators on spatial accelerators. arXiv preprint arXiv:2002.07752
https://github.com/georgia-tech-synergy-lab/mRNA

Download references

Author information

Authors and Affiliations

Department of Computer Engineering, Science and Research Branch, Islamic Azad University, Tehran, Iran
Midia Reshadi & Seyedeh Yasaman Hosseini Mirmahaleh

Authors

Midia Reshadi
View author publications
You can also search for this author in PubMed Google Scholar
Seyedeh Yasaman Hosseini Mirmahaleh
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Seyedeh Yasaman Hosseini Mirmahaleh.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Reshadi, M., Mirmahaleh, S.Y.H. Mapping and virtual neuron assignment algorithms for MAERI accelerator. J Supercomput 78, 238–257 (2022). https://doi.org/10.1007/s11227-021-03893-3

Download citation

Accepted: 13 May 2021
Published: 25 May 2021
Issue Date: January 2022
DOI: https://doi.org/10.1007/s11227-021-03893-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Mapping and virtual neuron assignment algorithms for MAERI accelerator

Abstract

Access this article

Similar content being viewed by others

Deep Learning: A Comprehensive Overview on Techniques, Taxonomy, Applications and Research Directions

Review of deep learning: concepts, CNN architectures, challenges, applications, future directions

A Precision-Aware Neuron Engine for DNN Accelerators

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Mapping and virtual neuron assignment algorithms for MAERI accelerator

Abstract

Access this article

Similar content being viewed by others

Deep Learning: A Comprehensive Overview on Techniques, Taxonomy, Applications and Research Directions

Review of deep learning: concepts, CNN architectures, challenges, applications, future directions

A Precision-Aware Neuron Engine for DNN Accelerators

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation