Skip to main content

Hardware Based Loop Optimization for CGRA Architectures

  • Conference paper
  • First Online:
Applied Reconfigurable Computing. Architectures, Tools, and Applications (ARC 2021)

Abstract

With the increasing demand for high performance computing in application domains with stringent power budgets, coarse-grained reconfigurable array (CGRA) architectures have become a popular choice among researchers and manufacturers. Loops are the hot-spots of kernels running on CGRAs and hence several techniques have been devised to optimize the loop execution. However, works in this direction are predominantly software-based solutions. This paper addresses the optimization opportunities at a deeper level and introduces a hardware based loop control mechanism that can support arbitrarily nested loops up to four levels. Major contributions of this work are, a lightweight Hardware Loop Block (HLB) for CGRAs that eliminates control instruction overhead of loops and an acyclic graph transformation that removes loop branches from the application CDFG. When tested on a set of kernels chosen from various application domains, the design could achieve a maximum of 1.9\(\times \) and an average of 1.5\(\times \) speed-up against the conventional approach. The total number of instructions executed is reduced to half for almost all the kernels with an area and power consumption overhead of 2.6% and 0.8% respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Bajwa, R.S., et al.: Instruction buffering to reduce power in processors for signal processing. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 5(4), 417–424 (1997)

    Google Scholar 

  2. Balasubramanian, M., Dave, S., Shrivastava, A., Jeyapaul, R.: LASER: a hardware/software approach to accelerate complicated loops on CGRAs. In: 2018 Design, Automation & Test in Europe Conference & Exhibition (DATE), pp. 1069–1074. IEEE (2018)

    Google Scholar 

  3. Das, S., Martin, K.J., Coussy, P., Rossi, D.: A heterogeneous cluster with reconfigurable accelerator for energy efficient near-sensor data analytics. In: 2018 IEEE International Symposium on Circuits and Systems (ISCAS), pp. 1–5. IEEE (2018)

    Google Scholar 

  4. Das, S., Martin, K.J., Coussy, P., Rossi, D., Benini, L.: Efficient mapping of CDFG onto coarse-grained reconfigurable array architectures. In: 2017 22nd Asia and South Pacific Design Automation Conference (ASP-DAC), pp. 127–132. IEEE (2017)

    Google Scholar 

  5. Das, S., Martin, K.J., Rossi, D., Coussy, P., Benini, L.: An energy-efficient integrated programmable array accelerator and compilation flow for near-sensor ultralow power processing. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 38(6), 1095–1108 (2018)

    Article  Google Scholar 

  6. Dragomir, O.S., Bertels, K.: Extending loop unrolling and shifting for reconfigurable architectures. In: Architectures and Compilers for Embedded Systems (ACES), pp. 61–64 (2010)

    Google Scholar 

  7. Gautschi, M., et al.: Near-threshold RISC-V core with DSP extensions for scalable IoT endpoint devices. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 25(10), 2700–2713 (2017)

    Google Scholar 

  8. Hamzeh, M., Shrivastava, A., Vrudhula, S.: EPIMap: using epimorphism to map applications on CGRAs. In: Proceedings of the 49th Annual Design Automation Conference, pp. 1284–1291 (2012)

    Google Scholar 

  9. Kavvadias, N., Nikolaidis, S.: Elimination of overhead operations in complex loop structures for embedded microprocessors. IEEE Trans. Comput. 57(2), 200–214 (2008)

    Article  MathSciNet  Google Scholar 

  10. Liu, D., Yin, S., Liu, L., Wei, S.: Polyhedral model based mapping optimization of loop nests for CGRAs. In: Proceedings of the 50th Annual Design Automation Conference, pp. 1–8 (2013)

    Google Scholar 

  11. Masuyama, K., Fujita, Y., Okuhara, H., Amano, H.: A 297mops/0.4 mw ultra low power coarse-grained reconfigurable accelerator CMA-SOTB-2. In: 2015 International Conference on ReConFigurable Computing and FPGAs (ReConFig), pp. 1–6. IEEE (2015)

    Google Scholar 

  12. Mathew, B., Davis, A.: A loop accelerator for low power embedded VLIW processors. In: Proceedings of the 2nd IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis, pp. 6–11 (2004)

    Google Scholar 

  13. Park, H., Fan, K., Mahlke, S.A., Oh, T., Kim, H., Kim, H.S.: Edge-centric modulo scheduling for coarse-grained reconfigurable architectures. In: Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques, pp. 166–176 (2008)

    Google Scholar 

  14. Prabhakar, R., et al.: Plasticine: a reconfigurable architecture for parallel patterns. In: 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA), pp. 389–402. IEEE (2017)

    Google Scholar 

  15. Tsao, Y.L., Chen, W.H., Cheng, W.S., Lin, M.C., Jou, S.J.: Hardware nested looping of parameterized and embedded DSP core. In: Proceedings of IEEE International [Systems-on-Chip] SOC Conference, pp. 49–52. IEEE (2003)

    Google Scholar 

  16. Vadivel, K., Wijtvliet, M., Jordans, R., Corporaal, H.: Loop overhead reduction techniques for coarse grained reconfigurable architectures. In: 2017 Euromicro Conference on Digital System Design (DSD), pp. 14–21. IEEE (2017)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chilankamol Sunny .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Sunny, C., Das, S., Martin, K.J.M., Coussy, P. (2021). Hardware Based Loop Optimization for CGRA Architectures. In: Derrien, S., Hannig, F., Diniz, P.C., Chillet, D. (eds) Applied Reconfigurable Computing. Architectures, Tools, and Applications. ARC 2021. Lecture Notes in Computer Science(), vol 12700. Springer, Cham. https://doi.org/10.1007/978-3-030-79025-7_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-79025-7_5

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-79024-0

  • Online ISBN: 978-3-030-79025-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics