Skip to main content

Accelerator Design with High-Level Synthesis

  • Living reference work entry
  • First Online:
Handbook of Computer Architecture

Abstract

Specialized accelerators can exploit spatial parallelism on both operations and data thanks to a dedicated microarchitecture with a better use of the hardware resources. Designers need to describe such components (including the resources, their interconnections, and the control logic) in proper hardware languages compatible with synthesis tools. This process requires hardware design skills that are uncommon in software programmers. To boost the use of spatial accelerators, software programmers need automated methods, like high-level synthesis (HLS), to specify hardware blocks with high-level languages and automatically translate their specifications into the corresponding hardware descriptions ready for synthesis. While HLS is a key enabling technology for the design of complex hardware/software architectures, developing efficient spatial accelerators requires efficient HLS methods to co-optimize performance and hardware cost with a hardware/software co-design approach. In this chapter, we present the current state of the art in high-level synthesis, covering all steps to create the specialized microarchitecture of an accelerator. We also discuss outstanding challenges that can be addressed with the use of HLS.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

References

  • Ashouri AH, Killian W, Cavazos J, Palermo G, Silvano C (2018) A survey on compiler autotuning using machine learning. ACM Comput Surv 51 (5):1–42

    Article  Google Scholar 

  • Bazargan K, Kastner R, Ogrenci S, Sarrafzadeh M (2000) A c to hardware/software compiler. In: Proceedings of the IEEE symposium on field-programmable custom computing machines (FCCM), pp 331–332

    Google Scholar 

  • Bombieri N, Liu H-Y, Fummi F, Carloni LP (2013) A method to abstract RTL IP blocks into C++ code and enable high-level synthesis. In: Proceedings of the ACM/EDAC/IEEE design automation conference (DAC), pp 1–9

    Google Scholar 

  • Brewer F, Gajski DD (1990) Chippe: a system for constraint driven behavioral synthesis. IEEE Trans Comput-Aided Des Integr Circuits Syst 9 (7): 681–695

    Article  Google Scholar 

  • Brisk P, Kaplan A, Sarrafzadeh M (2004) Area-efficient instruction set synthesis for reconfigurable system-on-chip designs. In: Proceedings of the ACM/EDAC/IEEE design automation conference (DAC), pp 395–400

    Google Scholar 

  • Brisk P, Dabiri F, Jafari R, Sarrafzadeh M (2006) Optimal register sharing for high-level synthesis of SSA form programs. IEEE Trans Comput-Aided Des Integr Circuits Syst 25 (5): 772–779

    Article  Google Scholar 

  • Buyukkurt B, Cortes J, Villarreal J, Najjar WA (2011) Impact of high-level transformations within the ROCCC framework. ACM Trans Archit Code Optim (TACO) 7 (4):17

    Google Scholar 

  • Canis A, Choi J, Aldham M, Zhang V, Kammoona A, Czajkowski T, Brown SD, Anderson JH (2013) Legup: an open-source high-level synthesis tool for FPGA-based processor/accelerator systems. ACM Trans Embed Comput Syst (TECS) 13 (2): 24:1–24:27

    Google Scholar 

  • Canis A, Brown SD, Anderson JH (2014) Modulo SDC scheduling with recurrence minimization in high-level synthesis. In: Proceedings of the IEEE international conference on field programmable logic and applications (FPL), pp 1–8

    Google Scholar 

  • Carloni LP (2015) From latency-insensitive design to communication-based system-level design. Proc IEEE 103 (11): 2133–2151

    Article  Google Scholar 

  • Chang E-S, Gajski DD, Narayan S (1996) An optimal clock period selection method based on slack minimization criteria. ACM Trans Des Autom Electron Syst (TODAES) 1 (3): 352–370

    Article  Google Scholar 

  • Chatarasi P, Neuendorffer S, Bayliss S, Vissers K, Sarkar V (2020) Vyasa: a high-performance vectorizing compiler for tensor convolutions on the Xilinx AI engine

    Google Scholar 

  • Chen D, Cong J (2004) Register binding and port assignment for multiplexer optimization. In: Proceeding of the Asia and South Pacific design automation conference (ASPDAC), pp 68–73

    Google Scholar 

  • Chen W, Ray S, Bhadra J, Abadir M, Wang L (2017) Challenges and trends in modern SoC design verification. IEEE Des Test 34 (5): 7–22

    Article  Google Scholar 

  • Chen J, Zaman M, Makris Y, Blanton RDS, Mitra S, Schafer BC (2020) DECOY: DEflection-Driven HLS-Based Computation Partitioning for Obfuscating Intellectual Property. In: Proceedings of the ACM/IEEE design automation conference (DAC), pp 1–6

    Google Scholar 

  • Choi J, Brown SD, Anderson JH (2017) From pthreads to multicore hardware systems in legup high-level synthesis for FPGAs. IEEE Trans Very Large Scale Integr (VLSI) Syst 25 (10): 2867–2880

    Article  Google Scholar 

  • Cilardo A, Flich J, Gagliardi M, Gavila RT (2015) Customizable heterogeneous acceleration for tomorrow’s high-performance computing. In: Proceedings of the IEEE international conference on high performance computing and communications (HPCC), pp 1181–1185

    Google Scholar 

  • Cong J (2015) High-level synthesis and beyond – from datacenters to IoTs. In: Proceedings of the IEEE international system-on-chip conference (SOCC), pp 1–1

    Google Scholar 

  • Cong J, Zhang Z (2006) An efficient and versatile scheduling algorithm based on SDC formulation. In: Proceedings of the ACM/EDAC/IEEE design automation conference (DAC), pp 433–438

    Google Scholar 

  • Cong J, Fan Y, Han G, Jiang W, Zhang Z (2006) Platform-based behavior-level and system-level synthesis. In: Proceedings of the IEEE international SOC conference, pp 199–202

    Google Scholar 

  • Cong J, Liu B, Neuendorffer S, Noguera J, Vissers K, Zhang Z (2011) High-level synthesis for FPGAs: from prototyping to deployment. IEEE Trans Comput-Aided Des Integr Circuits Syst 30 (4): 473–491

    Article  Google Scholar 

  • Cong J, Huang M, Pan P, Wang Y, Zhang P (2016) Source-to-source optimization for HLS, pp 137–163

    Google Scholar 

  • Cota EG, Mantovani P, Guglielmo GD, Carloni LP (2015) An analysis of accelerator coupling in heterogeneous architectures. In: Proceedings of the ACM/EDAC/IEEE design automation conference (DAC)

    Google Scholar 

  • Coussy P, Chavet C, Bomel P, Heller D, Senn E, Martin E (2008) GAUT: a high-level synthesis tool for DSP applications, pp 147–169

    Google Scholar 

  • Dai S, Liu G, Zhang Z (2018) A scalable approach to exact resource-constrained scheduling based on a joint SDC and SAT formulation. In: Proceedings of the ACM/SIGDA international symposium on field programmable gate arrays (FPGA), pp 137–146

    Google Scholar 

  • Dally WJ, Turakhia Y, Han S (2020) Domain-specific hardware accelerators. Commun ACM 63 (7): 48–57. ISSN 0001-0782

    Google Scholar 

  • De Micheli G (1993) High-level synthesis of digital circuits. In: Advances in computers, vol 37. Elsevier, The Netherlands, pp 207–283

    Google Scholar 

  • de Fine Licht J, Besta M, Meierhans S, Hoefler T (2021) Transformations of high-level synthesis codes for high-performance computing. IEEE Trans Parallel Distrib Syst 32 (05): 1014–1029

    Article  Google Scholar 

  • Edwards SA (2006) The challenges of synthesizing hardware from c-like languages. IEEE Des Test 23 (5): 375–386

    Article  MathSciNet  Google Scholar 

  • Edwards SA, Townsend R, Barker M, Kim MA (2019) Compositional dataflow circuits. ACM Trans Embed Comput Syst (TECS) 18 (1):1–27

    Article  Google Scholar 

  • Ernst D, Kim NS, Das S, Pant S, Rao R, Pham T, Ziesler C, Blaauw D, Austin T, Flautner K, Mudge T (2003) Razor: a low-power pipeline based on circuit-level timing speculation. In: Proceedings of the annual IEEE/ACM international symposium on microarchitecture (MICRO), pp 7–18

    Google Scholar 

  • Esmaeilzadeh H, Blem E, St. Amant R, Sankaralingam K, Burger D (2012) Dark silicon and the end of multicore scaling. IEEE Micro 32 (3): 122–134

    Article  Google Scholar 

  • Fezzardi P, Castellana M, Ferrandi F (2015) Trace-based automated logical debugging for high-level synthesis generated circuits. In: Proceedings of the IEEE international conference on computer design (ICCD), pp 251–258

    Google Scholar 

  • Gajski DD (1984) Silicon compilers and expert systems for VLSI. In: Proceedings of the ACM/EDAC/IEEE design automation conference (DAC), pp 86–87

    Google Scholar 

  • Galuzzi C, Panainte EM, Yankova Y, Bertels K, Vassiliadis S (2006) Automatic selection of application-specific instruction-set extensions. In: Proceedings of the IEEE/ACM/IFIP international conference on hardware/software codesign and system synthesis (CODES+ISSS), pp 160–165

    Google Scholar 

  • Genc H, Haj-Ali A, Iyer V, Amid A, Mao H, Wright J, Schmidt C, Zhao J, Ou A, Banister M, Shao YS, Nikolic B, Stoica I, Asanovic K (2019) Gemmini: an agile systolic array generator enabling systematic evaluations of deep-learning architectures. arXiv preprint arXiv:1911.09925

    Google Scholar 

  • Giri D, Chiu KL, Di Guglielmo G, Mantovani P, Carloni LP (2020) ESP4ML: platform-based design of systems-on-chip for embedded machine learning. In: Proceedings of the ACM/EDAC/IEEE design, automation & test conference in Europe (DATE), pp 1049–1054

    Google Scholar 

  • Guglielmo GD, Pilato C, Carloni LP (2014) A design methodology for compositional high-level synthesis of communication-centric SoCs. In: Proceedings of the ACM/EDAC/IEEE design automation conference (DAC), pp 1–6

    Google Scholar 

  • Gupta S, Dutt N, Gupta R, Nicolau A (2003) Spark: a high-level synthesis framework for applying parallelizing compiler transformations. In: Proceedings of the international conference on VLSI design, pp 461–466

    Google Scholar 

  • Hadjis S, Canis A, Sobue R, Hara-Azumi Y, Tomiyama H, Anderson JH (2015) Profiling-driven multi-cycling in FPGA high-level synthesis. In: Proceedings of the ACM/EDAC/IEEE design, automation & test conference in Europe (DATE), pp 31–36

    Google Scholar 

  • Hameed F, Khan AA, Castrillon J (2018) Performance and energy-efficient design of STT-RAM last-level cache. IEEE Trans Very Large Scale Integr (VLSI) Syst 26 (6): 1059–1072

    Article  Google Scholar 

  • Horowitz M (2014) 1.1 computing’s energy problem (and what we can do about it). In: Proceedings of the IEEE international solid-state circuits conference digest of technical papers (ISSCC), pp 10–14

    Google Scholar 

  • Hsiao H, Anderson JH (2019) Thread weaving: static resource scheduling for multithreaded high-level synthesis. In: Proceedings of the ACM/EDAC/IEEE design automation conference (DAC)

    Google Scholar 

  • Huang Q, Lian R, Canis A, Choi J, Xi R, Calagar N, Brown SD, Anderson JH (2015) The effect of compiler optimizations on high-level synthesis-generated hardware. ACM Trans Reconfig Technol Syst (TRETS) 8 (3): 14:1–14:26

    Google Scholar 

  • Jiang Z, Dai S, Suh GE, Zhang Z (2018) High-level synthesis with timing-sensitive information flow enforcement. In: Proceedings of the IEEE/ACM international conference on computer-aided design (ICCAD), pp 1–8

    Google Scholar 

  • Josipovic L, Brisk P, Ienne P (2017a) From c to elastic circuits. In: Proceedings of the asilomar conference on signals, systems, and computers (ACSSC), pp 121–125

    Google Scholar 

  • Josipovic L, Brisk P, Ienne P (2017b) An out-of-order load-store queue for spatial computing. In: Proceedings of the IEEE symposium on field-programmable custom computing machines (FCCM), pp 134–134

    Google Scholar 

  • Josipović L, Ghosal R, Ienne P (2018) Dynamically scheduled high-level synthesis. In: Proceedings of the ACM/SIGDA international symposium on field programmable gate arrays (FPGA), pp 127–136

    Google Scholar 

  • Klimovic A, Anderson JH (2013) Bitwidth-optimized hardware accelerators with software fallback. In: Proceedings of the IEEE international conference on field-programmable technology (FPT), pp 136–143

    Google Scholar 

  • Koeplinger D, Feldman M, Prabhakar R, Zhang Y, Hadjis S, Fiszel R, Zhao T, Nardi L, Pedram A, Kozyrakis C, Olukotun K (2018) Spatial: a language and compiler for application accelerators. In: Proceedings of the ACM SIGPLAN conference on programming language design and implementation (PLDI), pp 296–311. ISBN 9781450356985

    Google Scholar 

  • Kotsifakou M, Srivastava P, Sinclair MD, Komuravelli R, Adve V, Adve S (2018) HPVM: heterogeneous parallel virtual machine. In: Proceedings of the ACM SIGPLAN symposium on principles and practice of parallel programming (PPoPP), pp 68–80

    Google Scholar 

  • Ku DC, Micheli GD (1991) Constrained resource sharing and conflict resolution in hebe. Integration 12 (2): 131–165

    Article  Google Scholar 

  • Lattner C, Amini M, Bondhugula U, Cohen A, Davis A, Pienaar J, Riddle R, Shpeisman T, Vasilache N, Zinenko O (2020) Mlir: a compiler infrastructure for the end of Moore’s law

    Google Scholar 

  • Lattner C, Amini M, Bondhugula U, Cohen A, Davis A, Pienaar J, Riddle R, Shpeisman T, Vasilache N, Zinenko O (2021) MLIR: scaling compiler infrastructure for domain specific computation. In: Proceedings of the IEEE/ACM international symposium on code generation and optimization (CGO), pp 2–14

    Google Scholar 

  • Lattuada M, Ferrandi F (2015) Code transformations based on speculative SDC scheduling. In: Proceedings of the IEEE/ACM international conference on computer-aided design (ICCAD), pp 71–77

    Google Scholar 

  • Lattuada M, Ferrandi F (2019) A design flow engine for the support of customized dynamic high level synthesis flows. ACM Trans Reconfig Technol Syst (TRETS) 12 (4):1–26

    Article  Google Scholar 

  • Liu J, Cong J (2019) Dataflow systolic array implementations of matrix decomposition using high level synthesis. In: Proceedings of the ACM/SIGDA international symposium on field programmable gate arrays (FPGA), p 187

    Google Scholar 

  • Makrani HM, Sayadi H, Mohsenin T, Rafatirad S, Sasan A, Homayoun H (2019) XPPE: cross-platform performance estimation of hardware accelerators using machine learning. In: Proceedings of the 24th Asia and South Pacific design automation conference (ASPDAC)

    Google Scholar 

  • Mantovani P, Cota EG, Pilato C, Guglielmo GD, Carloni LP (2016a) Handling large data sets for high-performance embedded applications in heterogeneous systems-on-chip. In: Proceedings of the international conference on compliers, architectures, and sythesis of embedded systems (CASES), pp 3:1–3:10

    Google Scholar 

  • Mantovani P, Cota EG, Tien K, Pilato C, Guglielmo GD, Shepard K, Carloni LP (2016b) An FPGA-based infrastructure for fine-grained DVFS analysis in high-performance embedded systems. In: Proceedings of the ACM/EDAC/IEEE design automation conference (DAC)

    Google Scholar 

  • Mantovani P, Giri D, Di Guglielmo G, Piccolboni L, Zuckerman J, Cota EG, Petracca M, Pilato C, Carloni LP (2020) Agile SoC development with open ESP. In: Proceedings of the IEEE/ACM international conference on computer-aided design (ICCAD), pp 1–6

    Google Scholar 

  • Martin G, Smith G (2009) High-level synthesis: past, present, and future. IEEE Des Test Comput 26 (4): 18–25

    Article  Google Scholar 

  • Minutoli M, Castellana VG, Tumeo A, Ferrandi F (2015) Inter-procedural resource sharing in high level synthesis through function proxies. In: Proceedings of the IEEE international conference on field programmable logic and applications (FPL), pp 1–8

    Google Scholar 

  • Minutoli M, Castellana VG, Tumeo A, Lattuada M, Ferrandi F (2016) Efficient synthesis of graph methods: a dynamically scheduled architecture. In: Proceedings of the IEEE/ACM international conference on computer-aided design (ICCAD)

    Google Scholar 

  • Nane R, Sima VM, Pilato C, Choi J, Fort B, Canis A, Chen YT, Hsiao H, Brown S, Ferrandi F, Anderson J, Bertels K (2016) A survey and evaluation of FPGA high-level synthesis tools. IEEE Trans Comput-Aided Des Integr Circuits Syst 35 (10): 1591–1604

    Article  Google Scholar 

  • Ndu G (2012) Boosting single thread performance in mobile processors using reconfigurable acceleration. PhD thesis, 10

    Google Scholar 

  • Pilato C (2017) Bridging the gap between software and hardware designers using high-level synthesis. In: Proceedings of the international conference on parallel computing (PARCO), pp 622–631

    Google Scholar 

  • Pilato C, Ferrandi F (2013) Bambu: a modular framework for the high level synthesis of memory-intensive applications. In: Proceedings of the IEEE international conference on field programmable logic and applications (FPL), pp 1–4

    Google Scholar 

  • Pilato C, Tumeo A, Palermo G, Ferrandi F, Lanzi PL, Sciuto D (2008) Improving evolutionary exploration to area-time optimization of FPGA designs. J Syst Archit Embed Syst Des 54 (11): 1046–1057

    Article  Google Scholar 

  • Pilato C, Castellana VG, Lovergine S, Ferrandi F (2011a) A runtime adaptive controller for supporting hardware components with variable latency. In: Proceedings of the NASA/ESA conference on adaptive hardware and systems (AHS), pp 153–160

    Google Scholar 

  • Pilato C, Ferrandi F, Sciuto D (2011b) A design methodology to implement memory accesses in high-level synthesis. In: Proceedings of the IEEE/ACM/IFIP international conference on hardware/software codesign and system synthesis (CODES+ISSS), pp 49–58

    Google Scholar 

  • Pilato C, Mantovani P, Guglielmo GD, Carloni LP (2017) System-level optimization of accelerator local memory for heterogeneous systems-on-chip. IEEE Trans Comput-Aided Des Integr Circuits Syst 36 (3): 435–448

    Google Scholar 

  • Pilato C, Garg S, Wu K, Karri R, Regazzoni F (2018a) Securing hardware accelerators: a new challenge for high-level synthesis. IEEE Embed Syst Lett 10 (3): 77–80

    Article  Google Scholar 

  • Pilato C, Basu K, Shayan M, Regazzoni F, Karri R (2018b) High-level synthesis of benevolent trojans. In: Proceedings of the ACM/EDAC/IEEE design, automation & test conference in Europe (DATE), pp 1124–1129

    Google Scholar 

  • Pilato C, Regazzoni F, Karri R, Garg S (2018c) TAO: techniques for algorithm-level obfuscation during high-level synthesis. In: Proceedings of the ACM/EDAC/IEEE design automation conference (DAC), pp 1–6

    Google Scholar 

  • Pilato C, Wu K, Garg S, Karri R, Regazzoni F (2019) TaintHLS: high-level synthesis for dynamic information flow tracking. IEEE Trans Comput-Aided Des Integr Circuits Syst 38 (5): 798–808

    Article  Google Scholar 

  • Pilato C, Bohm S, Brocheton F, Castrillon J, Cevasco R, Cima V, Cmar R, Diamantopoulos D, Ferrandi F, Martinovic J, Palermo G, Paolino M, Parodi A, Pittaluga L, Raho D, Regazzoni F, Slaninova K, Hagleitner C (2021) EVEREST: a design environment for extreme-scale big data analytics on heterogeneous platforms. In: Proceedings of the design, automation, and test in Europe conference and exhibition (DATE)

    Google Scholar 

  • Pothineni N, Brisk P, Ienne P, Kumar A, Paul K (2010) A high-level synthesis flow for custom instruction set extensions for application-specific processors. In: Proceedings of the IEEE Asian and South Pacific design automation conference (ASP-DAC), pp 707–712

    Google Scholar 

  • Pu J, Bell S, Yang X, Setter J, Richardson S, Ragan-Kelley J, Horowitz M (2017) Programming heterogeneous systems from an image processing DSL. ACM Trans Archit Code Optim 14 (3):1–25

    Article  Google Scholar 

  • Ragan-Kelley J, Barnes C, Adams A, Paris S, Durand F, Amarasinghe S (2013) Halide: a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines. In: Proceedings of the ACM SIGPLAN conference on programming language design and implementation (PLDI), pp 519–530. ISBN 9781450320146

    Google Scholar 

  • Ranjan Panda P, Dutt ND, Nicolau A (1998) Incorporating dram access modes into high-level synthesis. IEEE Trans Comput-Aided Des Integr Circuits Syst 17 (2): 96–109

    Article  Google Scholar 

  • Stok L (1994) Data path synthesis. Integration 18 (1): 1–71

    Article  Google Scholar 

  • Venkatesan R, Shao YS, Wang M, Clemons J, Dai S, Fojtik M, Keller B, Klinefelter A, Pinckney N, Raina P, Zhang Y, Zimmer B, Dally WJ, Emer J, Keckler SW, Khailany B (2019) Magnet: a modular accelerator generator for neural networks. In: Proceedings of the IEEE/ACM international conference on computer-aided design (ICCAD), pp 1–8

    Google Scholar 

  • Weerasinghe J, Polig R, Abel F, Hagleitner C (2016) Network-attached FPGAs for data center applications. In: Proceedings of the international conference on field-programmable technology (FPT), pp 36–43

    Google Scholar 

  • Windh S, Ma X, Halstead RJ, Budhkar P, Luna Z, Hussaini O, Najjar WA (2015) High-level language tools for reconfigurable computing. Proc IEEE 103 (3): 390–408

    Article  Google Scholar 

  • Zhu J, Gajski DD (1999) A unified formal model of ISA and FSMD. In: Proceedings of the seventh international workshop on hardware/software codesign (CODES), pp 121–125

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Christian Pilato .

Editor information

Editors and Affiliations

Section Editor information

Rights and permissions

Reprints and permissions

Copyright information

© 2022 Springer Nature Singapore Pte Ltd.

About this entry

Check for updates. Verify currency and authenticity via CrossMark

Cite this entry

Pilato, C., Soldavini, S. (2022). Accelerator Design with High-Level Synthesis. In: Chattopadhyay, A. (eds) Handbook of Computer Architecture. Springer, Singapore. https://doi.org/10.1007/978-981-15-6401-7_19-1

Download citation

  • DOI: https://doi.org/10.1007/978-981-15-6401-7_19-1

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-15-6401-7

  • Online ISBN: 978-981-15-6401-7

  • eBook Packages: Springer Reference EngineeringReference Module Computer Science and Engineering

Publish with us

Policies and ethics