Hardware Based Loop Optimization for CGRA Architectures

Sunny, Chilankamol; Das, Satyajit; Martin, Kevin J. M.; Coussy, Philippe

doi:10.1007/978-3-030-79025-7_5

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12700))

Included in the following conference series:

International Symposium on Applied Reconfigurable Computing

Abstract

With the increasing demand for high performance computing in application domains with stringent power budgets, coarse-grained reconfigurable array (CGRA) architectures have become a popular choice among researchers and manufacturers. Loops are the hot-spots of kernels running on CGRAs and hence several techniques have been devised to optimize the loop execution. However, works in this direction are predominantly software-based solutions. This paper addresses the optimization opportunities at a deeper level and introduces a hardware based loop control mechanism that can support arbitrarily nested loops up to four levels. Major contributions of this work are, a lightweight Hardware Loop Block (HLB) for CGRAs that eliminates control instruction overhead of loops and an acyclic graph transformation that removes loop branches from the application CDFG. When tested on a set of kernels chosen from various application domains, the design could achieve a maximum of 1.9\(\times \) and an average of 1.5\(\times \) speed-up against the conventional approach. The total number of instructions executed is reduced to half for almost all the kernels with an area and power consumption overhead of 2.6% and 0.8% respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Energy Efficient Hardware Loop Based Optimization for CGRAs

Article 21 May 2022

Coarse-Grained Reconfigurable Array Architectures

Improved Condition Handling in CGRAs with Complex Loop Support

References

Bajwa, R.S., et al.: Instruction buffering to reduce power in processors for signal processing. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 5(4), 417–424 (1997)
Google Scholar
Balasubramanian, M., Dave, S., Shrivastava, A., Jeyapaul, R.: LASER: a hardware/software approach to accelerate complicated loops on CGRAs. In: 2018 Design, Automation & Test in Europe Conference & Exhibition (DATE), pp. 1069–1074. IEEE (2018)
Google Scholar
Das, S., Martin, K.J., Coussy, P., Rossi, D.: A heterogeneous cluster with reconfigurable accelerator for energy efficient near-sensor data analytics. In: 2018 IEEE International Symposium on Circuits and Systems (ISCAS), pp. 1–5. IEEE (2018)
Google Scholar
Das, S., Martin, K.J., Coussy, P., Rossi, D., Benini, L.: Efficient mapping of CDFG onto coarse-grained reconfigurable array architectures. In: 2017 22nd Asia and South Pacific Design Automation Conference (ASP-DAC), pp. 127–132. IEEE (2017)
Google Scholar
Das, S., Martin, K.J., Rossi, D., Coussy, P., Benini, L.: An energy-efficient integrated programmable array accelerator and compilation flow for near-sensor ultralow power processing. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 38(6), 1095–1108 (2018)
Article Google Scholar
Dragomir, O.S., Bertels, K.: Extending loop unrolling and shifting for reconfigurable architectures. In: Architectures and Compilers for Embedded Systems (ACES), pp. 61–64 (2010)
Google Scholar
Gautschi, M., et al.: Near-threshold RISC-V core with DSP extensions for scalable IoT endpoint devices. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 25(10), 2700–2713 (2017)
Google Scholar
Hamzeh, M., Shrivastava, A., Vrudhula, S.: EPIMap: using epimorphism to map applications on CGRAs. In: Proceedings of the 49th Annual Design Automation Conference, pp. 1284–1291 (2012)
Google Scholar
Kavvadias, N., Nikolaidis, S.: Elimination of overhead operations in complex loop structures for embedded microprocessors. IEEE Trans. Comput. 57(2), 200–214 (2008)
Article MathSciNet Google Scholar
Liu, D., Yin, S., Liu, L., Wei, S.: Polyhedral model based mapping optimization of loop nests for CGRAs. In: Proceedings of the 50th Annual Design Automation Conference, pp. 1–8 (2013)
Google Scholar
Masuyama, K., Fujita, Y., Okuhara, H., Amano, H.: A 297mops/0.4 mw ultra low power coarse-grained reconfigurable accelerator CMA-SOTB-2. In: 2015 International Conference on ReConFigurable Computing and FPGAs (ReConFig), pp. 1–6. IEEE (2015)
Google Scholar
Mathew, B., Davis, A.: A loop accelerator for low power embedded VLIW processors. In: Proceedings of the 2nd IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis, pp. 6–11 (2004)
Google Scholar
Park, H., Fan, K., Mahlke, S.A., Oh, T., Kim, H., Kim, H.S.: Edge-centric modulo scheduling for coarse-grained reconfigurable architectures. In: Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques, pp. 166–176 (2008)
Google Scholar
Prabhakar, R., et al.: Plasticine: a reconfigurable architecture for parallel patterns. In: 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA), pp. 389–402. IEEE (2017)
Google Scholar
Tsao, Y.L., Chen, W.H., Cheng, W.S., Lin, M.C., Jou, S.J.: Hardware nested looping of parameterized and embedded DSP core. In: Proceedings of IEEE International [Systems-on-Chip] SOC Conference, pp. 49–52. IEEE (2003)
Google Scholar
Vadivel, K., Wijtvliet, M., Jordans, R., Corporaal, H.: Loop overhead reduction techniques for coarse grained reconfigurable architectures. In: 2017 Euromicro Conference on Digital System Design (DSD), pp. 14–21. IEEE (2017)
Google Scholar

Download references

Author information

Authors and Affiliations

IIT Palakkad, Palakkad, Kerala, India
Chilankamol Sunny & Satyajit Das
Univ. Bretagne-Sud, UMR 6285, Lab-STICC, 56100, Lorient, France
Kevin J. M. Martin & Philippe Coussy

Authors

Chilankamol Sunny
View author publications
You can also search for this author in PubMed Google Scholar
Satyajit Das
View author publications
You can also search for this author in PubMed Google Scholar
Kevin J. M. Martin
View author publications
You can also search for this author in PubMed Google Scholar
Philippe Coussy
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Chilankamol Sunny .

Editor information

Editors and Affiliations

IRISA, University of Rennes 1, Rennes, France
Steven Derrien
Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany
Frank Hannig
INESC-ID, Lisboa, Portugal
Pedro C. Diniz
ENSSAT, University of Rennes 1, Lannion, France
Daniel Chillet

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sunny, C., Das, S., Martin, K.J.M., Coussy, P. (2021). Hardware Based Loop Optimization for CGRA Architectures. In: Derrien, S., Hannig, F., Diniz, P.C., Chillet, D. (eds) Applied Reconfigurable Computing. Architectures, Tools, and Applications. ARC 2021. Lecture Notes in Computer Science(), vol 12700. Springer, Cham. https://doi.org/10.1007/978-3-030-79025-7_5

Download citation

DOI: https://doi.org/10.1007/978-3-030-79025-7_5
Published: 23 June 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-79024-0
Online ISBN: 978-3-030-79025-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Hardware Based Loop Optimization for CGRA Architectures

Abstract

Access this chapter

Similar content being viewed by others

Energy Efficient Hardware Loop Based Optimization for CGRAs

Coarse-Grained Reconfigurable Array Architectures

Improved Condition Handling in CGRAs with Complex Loop Support

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Hardware Based Loop Optimization for CGRA Architectures

Abstract

Access this chapter

Similar content being viewed by others

Energy Efficient Hardware Loop Based Optimization for CGRAs

Coarse-Grained Reconfigurable Array Architectures

Improved Condition Handling in CGRAs with Complex Loop Support

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation