Accelerator Design with High-Level Synthesis

Pilato, Christian; Soldavini, Stephanie

doi:10.1007/978-981-15-6401-7_19-1

Christian Pilato² &
Stephanie Soldavini²

358 Accesses

Abstract

Specialized accelerators can exploit spatial parallelism on both operations and data thanks to a dedicated microarchitecture with a better use of the hardware resources. Designers need to describe such components (including the resources, their interconnections, and the control logic) in proper hardware languages compatible with synthesis tools. This process requires hardware design skills that are uncommon in software programmers. To boost the use of spatial accelerators, software programmers need automated methods, like high-level synthesis (HLS), to specify hardware blocks with high-level languages and automatically translate their specifications into the corresponding hardware descriptions ready for synthesis. While HLS is a key enabling technology for the design of complex hardware/software architectures, developing efficient spatial accelerators requires efficient HLS methods to co-optimize performance and hardware cost with a hardware/software co-design approach. In this chapter, we present the current state of the art in high-level synthesis, covering all steps to create the specialized microarchitecture of an accelerator. We also discuss outstanding challenges that can be addressed with the use of HLS.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

References

Ashouri AH, Killian W, Cavazos J, Palermo G, Silvano C (2018) A survey on compiler autotuning using machine learning. ACM Comput Surv 51 (5):1–42
Article Google Scholar
Bazargan K, Kastner R, Ogrenci S, Sarrafzadeh M (2000) A c to hardware/software compiler. In: Proceedings of the IEEE symposium on field-programmable custom computing machines (FCCM), pp 331–332
Google Scholar
Bombieri N, Liu H-Y, Fummi F, Carloni LP (2013) A method to abstract RTL IP blocks into C++ code and enable high-level synthesis. In: Proceedings of the ACM/EDAC/IEEE design automation conference (DAC), pp 1–9
Google Scholar
Brewer F, Gajski DD (1990) Chippe: a system for constraint driven behavioral synthesis. IEEE Trans Comput-Aided Des Integr Circuits Syst 9 (7): 681–695
Article Google Scholar
Brisk P, Kaplan A, Sarrafzadeh M (2004) Area-efficient instruction set synthesis for reconfigurable system-on-chip designs. In: Proceedings of the ACM/EDAC/IEEE design automation conference (DAC), pp 395–400
Google Scholar
Brisk P, Dabiri F, Jafari R, Sarrafzadeh M (2006) Optimal register sharing for high-level synthesis of SSA form programs. IEEE Trans Comput-Aided Des Integr Circuits Syst 25 (5): 772–779
Article Google Scholar
Buyukkurt B, Cortes J, Villarreal J, Najjar WA (2011) Impact of high-level transformations within the ROCCC framework. ACM Trans Archit Code Optim (TACO) 7 (4):17
Google Scholar
Canis A, Choi J, Aldham M, Zhang V, Kammoona A, Czajkowski T, Brown SD, Anderson JH (2013) Legup: an open-source high-level synthesis tool for FPGA-based processor/accelerator systems. ACM Trans Embed Comput Syst (TECS) 13 (2): 24:1–24:27
Google Scholar
Canis A, Brown SD, Anderson JH (2014) Modulo SDC scheduling with recurrence minimization in high-level synthesis. In: Proceedings of the IEEE international conference on field programmable logic and applications (FPL), pp 1–8
Google Scholar
Carloni LP (2015) From latency-insensitive design to communication-based system-level design. Proc IEEE 103 (11): 2133–2151
Article Google Scholar
Chang E-S, Gajski DD, Narayan S (1996) An optimal clock period selection method based on slack minimization criteria. ACM Trans Des Autom Electron Syst (TODAES) 1 (3): 352–370
Article Google Scholar
Chatarasi P, Neuendorffer S, Bayliss S, Vissers K, Sarkar V (2020) Vyasa: a high-performance vectorizing compiler for tensor convolutions on the Xilinx AI engine
Google Scholar
Chen D, Cong J (2004) Register binding and port assignment for multiplexer optimization. In: Proceeding of the Asia and South Pacific design automation conference (ASPDAC), pp 68–73
Google Scholar
Chen W, Ray S, Bhadra J, Abadir M, Wang L (2017) Challenges and trends in modern SoC design verification. IEEE Des Test 34 (5): 7–22
Article Google Scholar
Chen J, Zaman M, Makris Y, Blanton RDS, Mitra S, Schafer BC (2020) DECOY: DEflection-Driven HLS-Based Computation Partitioning for Obfuscating Intellectual Property. In: Proceedings of the ACM/IEEE design automation conference (DAC), pp 1–6
Google Scholar
Choi J, Brown SD, Anderson JH (2017) From pthreads to multicore hardware systems in legup high-level synthesis for FPGAs. IEEE Trans Very Large Scale Integr (VLSI) Syst 25 (10): 2867–2880
Article Google Scholar
Cilardo A, Flich J, Gagliardi M, Gavila RT (2015) Customizable heterogeneous acceleration for tomorrow’s high-performance computing. In: Proceedings of the IEEE international conference on high performance computing and communications (HPCC), pp 1181–1185
Google Scholar
Cong J (2015) High-level synthesis and beyond – from datacenters to IoTs. In: Proceedings of the IEEE international system-on-chip conference (SOCC), pp 1–1
Google Scholar
Cong J, Zhang Z (2006) An efficient and versatile scheduling algorithm based on SDC formulation. In: Proceedings of the ACM/EDAC/IEEE design automation conference (DAC), pp 433–438
Google Scholar
Cong J, Fan Y, Han G, Jiang W, Zhang Z (2006) Platform-based behavior-level and system-level synthesis. In: Proceedings of the IEEE international SOC conference, pp 199–202
Google Scholar
Cong J, Liu B, Neuendorffer S, Noguera J, Vissers K, Zhang Z (2011) High-level synthesis for FPGAs: from prototyping to deployment. IEEE Trans Comput-Aided Des Integr Circuits Syst 30 (4): 473–491
Article Google Scholar
Cong J, Huang M, Pan P, Wang Y, Zhang P (2016) Source-to-source optimization for HLS, pp 137–163
Google Scholar
Cota EG, Mantovani P, Guglielmo GD, Carloni LP (2015) An analysis of accelerator coupling in heterogeneous architectures. In: Proceedings of the ACM/EDAC/IEEE design automation conference (DAC)
Google Scholar
Coussy P, Chavet C, Bomel P, Heller D, Senn E, Martin E (2008) GAUT: a high-level synthesis tool for DSP applications, pp 147–169
Google Scholar
Dai S, Liu G, Zhang Z (2018) A scalable approach to exact resource-constrained scheduling based on a joint SDC and SAT formulation. In: Proceedings of the ACM/SIGDA international symposium on field programmable gate arrays (FPGA), pp 137–146
Google Scholar
Dally WJ, Turakhia Y, Han S (2020) Domain-specific hardware accelerators. Commun ACM 63 (7): 48–57. ISSN 0001-0782
Google Scholar
De Micheli G (1993) High-level synthesis of digital circuits. In: Advances in computers, vol 37. Elsevier, The Netherlands, pp 207–283
Google Scholar
de Fine Licht J, Besta M, Meierhans S, Hoefler T (2021) Transformations of high-level synthesis codes for high-performance computing. IEEE Trans Parallel Distrib Syst 32 (05): 1014–1029
Article Google Scholar
Edwards SA (2006) The challenges of synthesizing hardware from c-like languages. IEEE Des Test 23 (5): 375–386
Article MathSciNet Google Scholar
Edwards SA, Townsend R, Barker M, Kim MA (2019) Compositional dataflow circuits. ACM Trans Embed Comput Syst (TECS) 18 (1):1–27
Article Google Scholar
Ernst D, Kim NS, Das S, Pant S, Rao R, Pham T, Ziesler C, Blaauw D, Austin T, Flautner K, Mudge T (2003) Razor: a low-power pipeline based on circuit-level timing speculation. In: Proceedings of the annual IEEE/ACM international symposium on microarchitecture (MICRO), pp 7–18
Google Scholar
Esmaeilzadeh H, Blem E, St. Amant R, Sankaralingam K, Burger D (2012) Dark silicon and the end of multicore scaling. IEEE Micro 32 (3): 122–134
Article Google Scholar
Fezzardi P, Castellana M, Ferrandi F (2015) Trace-based automated logical debugging for high-level synthesis generated circuits. In: Proceedings of the IEEE international conference on computer design (ICCD), pp 251–258
Google Scholar
Gajski DD (1984) Silicon compilers and expert systems for VLSI. In: Proceedings of the ACM/EDAC/IEEE design automation conference (DAC), pp 86–87
Google Scholar
Galuzzi C, Panainte EM, Yankova Y, Bertels K, Vassiliadis S (2006) Automatic selection of application-specific instruction-set extensions. In: Proceedings of the IEEE/ACM/IFIP international conference on hardware/software codesign and system synthesis (CODES+ISSS), pp 160–165
Google Scholar
Genc H, Haj-Ali A, Iyer V, Amid A, Mao H, Wright J, Schmidt C, Zhao J, Ou A, Banister M, Shao YS, Nikolic B, Stoica I, Asanovic K (2019) Gemmini: an agile systolic array generator enabling systematic evaluations of deep-learning architectures. arXiv preprint arXiv:1911.09925
Google Scholar
Giri D, Chiu KL, Di Guglielmo G, Mantovani P, Carloni LP (2020) ESP4ML: platform-based design of systems-on-chip for embedded machine learning. In: Proceedings of the ACM/EDAC/IEEE design, automation & test conference in Europe (DATE), pp 1049–1054
Google Scholar
Guglielmo GD, Pilato C, Carloni LP (2014) A design methodology for compositional high-level synthesis of communication-centric SoCs. In: Proceedings of the ACM/EDAC/IEEE design automation conference (DAC), pp 1–6
Google Scholar
Gupta S, Dutt N, Gupta R, Nicolau A (2003) Spark: a high-level synthesis framework for applying parallelizing compiler transformations. In: Proceedings of the international conference on VLSI design, pp 461–466
Google Scholar
Hadjis S, Canis A, Sobue R, Hara-Azumi Y, Tomiyama H, Anderson JH (2015) Profiling-driven multi-cycling in FPGA high-level synthesis. In: Proceedings of the ACM/EDAC/IEEE design, automation & test conference in Europe (DATE), pp 31–36
Google Scholar
Hameed F, Khan AA, Castrillon J (2018) Performance and energy-efficient design of STT-RAM last-level cache. IEEE Trans Very Large Scale Integr (VLSI) Syst 26 (6): 1059–1072
Article Google Scholar
Horowitz M (2014) 1.1 computing’s energy problem (and what we can do about it). In: Proceedings of the IEEE international solid-state circuits conference digest of technical papers (ISSCC), pp 10–14
Google Scholar
Hsiao H, Anderson JH (2019) Thread weaving: static resource scheduling for multithreaded high-level synthesis. In: Proceedings of the ACM/EDAC/IEEE design automation conference (DAC)
Google Scholar
Huang Q, Lian R, Canis A, Choi J, Xi R, Calagar N, Brown SD, Anderson JH (2015) The effect of compiler optimizations on high-level synthesis-generated hardware. ACM Trans Reconfig Technol Syst (TRETS) 8 (3): 14:1–14:26
Google Scholar
Jiang Z, Dai S, Suh GE, Zhang Z (2018) High-level synthesis with timing-sensitive information flow enforcement. In: Proceedings of the IEEE/ACM international conference on computer-aided design (ICCAD), pp 1–8
Google Scholar
Josipovic L, Brisk P, Ienne P (2017a) From c to elastic circuits. In: Proceedings of the asilomar conference on signals, systems, and computers (ACSSC), pp 121–125
Google Scholar
Josipovic L, Brisk P, Ienne P (2017b) An out-of-order load-store queue for spatial computing. In: Proceedings of the IEEE symposium on field-programmable custom computing machines (FCCM), pp 134–134
Google Scholar
Josipović L, Ghosal R, Ienne P (2018) Dynamically scheduled high-level synthesis. In: Proceedings of the ACM/SIGDA international symposium on field programmable gate arrays (FPGA), pp 127–136
Google Scholar
Klimovic A, Anderson JH (2013) Bitwidth-optimized hardware accelerators with software fallback. In: Proceedings of the IEEE international conference on field-programmable technology (FPT), pp 136–143
Google Scholar
Koeplinger D, Feldman M, Prabhakar R, Zhang Y, Hadjis S, Fiszel R, Zhao T, Nardi L, Pedram A, Kozyrakis C, Olukotun K (2018) Spatial: a language and compiler for application accelerators. In: Proceedings of the ACM SIGPLAN conference on programming language design and implementation (PLDI), pp 296–311. ISBN 9781450356985
Google Scholar
Kotsifakou M, Srivastava P, Sinclair MD, Komuravelli R, Adve V, Adve S (2018) HPVM: heterogeneous parallel virtual machine. In: Proceedings of the ACM SIGPLAN symposium on principles and practice of parallel programming (PPoPP), pp 68–80
Google Scholar
Ku DC, Micheli GD (1991) Constrained resource sharing and conflict resolution in hebe. Integration 12 (2): 131–165
Article Google Scholar
Lattner C, Amini M, Bondhugula U, Cohen A, Davis A, Pienaar J, Riddle R, Shpeisman T, Vasilache N, Zinenko O (2020) Mlir: a compiler infrastructure for the end of Moore’s law
Google Scholar
Lattner C, Amini M, Bondhugula U, Cohen A, Davis A, Pienaar J, Riddle R, Shpeisman T, Vasilache N, Zinenko O (2021) MLIR: scaling compiler infrastructure for domain specific computation. In: Proceedings of the IEEE/ACM international symposium on code generation and optimization (CGO), pp 2–14
Google Scholar
Lattuada M, Ferrandi F (2015) Code transformations based on speculative SDC scheduling. In: Proceedings of the IEEE/ACM international conference on computer-aided design (ICCAD), pp 71–77
Google Scholar
Lattuada M, Ferrandi F (2019) A design flow engine for the support of customized dynamic high level synthesis flows. ACM Trans Reconfig Technol Syst (TRETS) 12 (4):1–26
Article Google Scholar
Liu J, Cong J (2019) Dataflow systolic array implementations of matrix decomposition using high level synthesis. In: Proceedings of the ACM/SIGDA international symposium on field programmable gate arrays (FPGA), p 187
Google Scholar
Makrani HM, Sayadi H, Mohsenin T, Rafatirad S, Sasan A, Homayoun H (2019) XPPE: cross-platform performance estimation of hardware accelerators using machine learning. In: Proceedings of the 24th Asia and South Pacific design automation conference (ASPDAC)
Google Scholar
Mantovani P, Cota EG, Pilato C, Guglielmo GD, Carloni LP (2016a) Handling large data sets for high-performance embedded applications in heterogeneous systems-on-chip. In: Proceedings of the international conference on compliers, architectures, and sythesis of embedded systems (CASES), pp 3:1–3:10
Google Scholar
Mantovani P, Cota EG, Tien K, Pilato C, Guglielmo GD, Shepard K, Carloni LP (2016b) An FPGA-based infrastructure for fine-grained DVFS analysis in high-performance embedded systems. In: Proceedings of the ACM/EDAC/IEEE design automation conference (DAC)
Google Scholar
Mantovani P, Giri D, Di Guglielmo G, Piccolboni L, Zuckerman J, Cota EG, Petracca M, Pilato C, Carloni LP (2020) Agile SoC development with open ESP. In: Proceedings of the IEEE/ACM international conference on computer-aided design (ICCAD), pp 1–6
Google Scholar
Martin G, Smith G (2009) High-level synthesis: past, present, and future. IEEE Des Test Comput 26 (4): 18–25
Article Google Scholar
Minutoli M, Castellana VG, Tumeo A, Ferrandi F (2015) Inter-procedural resource sharing in high level synthesis through function proxies. In: Proceedings of the IEEE international conference on field programmable logic and applications (FPL), pp 1–8
Google Scholar
Minutoli M, Castellana VG, Tumeo A, Lattuada M, Ferrandi F (2016) Efficient synthesis of graph methods: a dynamically scheduled architecture. In: Proceedings of the IEEE/ACM international conference on computer-aided design (ICCAD)
Google Scholar
Nane R, Sima VM, Pilato C, Choi J, Fort B, Canis A, Chen YT, Hsiao H, Brown S, Ferrandi F, Anderson J, Bertels K (2016) A survey and evaluation of FPGA high-level synthesis tools. IEEE Trans Comput-Aided Des Integr Circuits Syst 35 (10): 1591–1604
Article Google Scholar
Ndu G (2012) Boosting single thread performance in mobile processors using reconfigurable acceleration. PhD thesis, 10
Google Scholar
Pilato C (2017) Bridging the gap between software and hardware designers using high-level synthesis. In: Proceedings of the international conference on parallel computing (PARCO), pp 622–631
Google Scholar
Pilato C, Ferrandi F (2013) Bambu: a modular framework for the high level synthesis of memory-intensive applications. In: Proceedings of the IEEE international conference on field programmable logic and applications (FPL), pp 1–4
Google Scholar
Pilato C, Tumeo A, Palermo G, Ferrandi F, Lanzi PL, Sciuto D (2008) Improving evolutionary exploration to area-time optimization of FPGA designs. J Syst Archit Embed Syst Des 54 (11): 1046–1057
Article Google Scholar
Pilato C, Castellana VG, Lovergine S, Ferrandi F (2011a) A runtime adaptive controller for supporting hardware components with variable latency. In: Proceedings of the NASA/ESA conference on adaptive hardware and systems (AHS), pp 153–160
Google Scholar
Pilato C, Ferrandi F, Sciuto D (2011b) A design methodology to implement memory accesses in high-level synthesis. In: Proceedings of the IEEE/ACM/IFIP international conference on hardware/software codesign and system synthesis (CODES+ISSS), pp 49–58
Google Scholar
Pilato C, Mantovani P, Guglielmo GD, Carloni LP (2017) System-level optimization of accelerator local memory for heterogeneous systems-on-chip. IEEE Trans Comput-Aided Des Integr Circuits Syst 36 (3): 435–448
Google Scholar
Pilato C, Garg S, Wu K, Karri R, Regazzoni F (2018a) Securing hardware accelerators: a new challenge for high-level synthesis. IEEE Embed Syst Lett 10 (3): 77–80
Article Google Scholar
Pilato C, Basu K, Shayan M, Regazzoni F, Karri R (2018b) High-level synthesis of benevolent trojans. In: Proceedings of the ACM/EDAC/IEEE design, automation & test conference in Europe (DATE), pp 1124–1129
Google Scholar
Pilato C, Regazzoni F, Karri R, Garg S (2018c) TAO: techniques for algorithm-level obfuscation during high-level synthesis. In: Proceedings of the ACM/EDAC/IEEE design automation conference (DAC), pp 1–6
Google Scholar
Pilato C, Wu K, Garg S, Karri R, Regazzoni F (2019) TaintHLS: high-level synthesis for dynamic information flow tracking. IEEE Trans Comput-Aided Des Integr Circuits Syst 38 (5): 798–808
Article Google Scholar
Pilato C, Bohm S, Brocheton F, Castrillon J, Cevasco R, Cima V, Cmar R, Diamantopoulos D, Ferrandi F, Martinovic J, Palermo G, Paolino M, Parodi A, Pittaluga L, Raho D, Regazzoni F, Slaninova K, Hagleitner C (2021) EVEREST: a design environment for extreme-scale big data analytics on heterogeneous platforms. In: Proceedings of the design, automation, and test in Europe conference and exhibition (DATE)
Google Scholar
Pothineni N, Brisk P, Ienne P, Kumar A, Paul K (2010) A high-level synthesis flow for custom instruction set extensions for application-specific processors. In: Proceedings of the IEEE Asian and South Pacific design automation conference (ASP-DAC), pp 707–712
Google Scholar
Pu J, Bell S, Yang X, Setter J, Richardson S, Ragan-Kelley J, Horowitz M (2017) Programming heterogeneous systems from an image processing DSL. ACM Trans Archit Code Optim 14 (3):1–25
Article Google Scholar
Ragan-Kelley J, Barnes C, Adams A, Paris S, Durand F, Amarasinghe S (2013) Halide: a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines. In: Proceedings of the ACM SIGPLAN conference on programming language design and implementation (PLDI), pp 519–530. ISBN 9781450320146
Google Scholar
Ranjan Panda P, Dutt ND, Nicolau A (1998) Incorporating dram access modes into high-level synthesis. IEEE Trans Comput-Aided Des Integr Circuits Syst 17 (2): 96–109
Article Google Scholar
Stok L (1994) Data path synthesis. Integration 18 (1): 1–71
Article Google Scholar
Venkatesan R, Shao YS, Wang M, Clemons J, Dai S, Fojtik M, Keller B, Klinefelter A, Pinckney N, Raina P, Zhang Y, Zimmer B, Dally WJ, Emer J, Keckler SW, Khailany B (2019) Magnet: a modular accelerator generator for neural networks. In: Proceedings of the IEEE/ACM international conference on computer-aided design (ICCAD), pp 1–8
Google Scholar
Weerasinghe J, Polig R, Abel F, Hagleitner C (2016) Network-attached FPGAs for data center applications. In: Proceedings of the international conference on field-programmable technology (FPT), pp 36–43
Google Scholar
Windh S, Ma X, Halstead RJ, Budhkar P, Luna Z, Hussaini O, Najjar WA (2015) High-level language tools for reconfigurable computing. Proc IEEE 103 (3): 390–408
Article Google Scholar
Zhu J, Gajski DD (1999) A unified formal model of ISA and FSMD. In: Proceedings of the seventh international workshop on hardware/software codesign (CODES), pp 121–125
Google Scholar

Download references

Author information

Authors and Affiliations

Dipartimento di Elettronica, Informazione e Bioingegneria, Politecnico di Milano, Milano, Italy
Christian Pilato & Stephanie Soldavini

Authors

Christian Pilato
View author publications
You can also search for this author in PubMed Google Scholar
Stephanie Soldavini
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Christian Pilato .

Editor information

Editors and Affiliations

Sch of Computer Science & Engineering, Nanyang Technological University, Singapore, Singapore
Anupam Chattopadhyay

Section Editor information

Center for Advancing Electronics Dresden, TU Dresden, Georg-Schumannstr. 7A, 01187, Dresden, Germany
Jeronimo Castrillon
2424 Raven Road, 94566, Pleasanton, CA, USA
Grant Martin M.Math

Rights and permissions

Reprints and permissions

Copyright information

About this entry

Cite this entry

Pilato, C., Soldavini, S. (2022). Accelerator Design with High-Level Synthesis. In: Chattopadhyay, A. (eds) Handbook of Computer Architecture. Springer, Singapore. https://doi.org/10.1007/978-981-15-6401-7_19-1

Download citation

DOI: https://doi.org/10.1007/978-981-15-6401-7_19-1
Published: 27 January 2022
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-6401-7
Online ISBN: 978-981-15-6401-7
eBook Packages: Springer Reference EngineeringReference Module Computer Science and Engineering

Publish with us

Policies and ethics