Abstract
Specialized accelerators can exploit spatial parallelism on both operations and data thanks to a dedicated microarchitecture with a better use of the hardware resources. Designers need to describe such components (including the resources, their interconnections, and the control logic) in proper hardware languages compatible with synthesis tools. This process requires hardware design skills that are uncommon in software programmers. To boost the use of spatial accelerators, software programmers need automated methods, like high-level synthesis (HLS), to specify hardware blocks with high-level languages and automatically translate their specifications into the corresponding hardware descriptions ready for synthesis. While HLS is a key enabling technology for the design of complex hardware/software architectures, developing efficient spatial accelerators requires efficient HLS methods to co-optimize performance and hardware cost with a hardware/software co-design approach. In this chapter, we present the current state of the art in high-level synthesis, covering all steps to create the specialized microarchitecture of an accelerator. We also discuss outstanding challenges that can be addressed with the use of HLS.
References
Ashouri AH, Killian W, Cavazos J, Palermo G, Silvano C (2018) A survey on compiler autotuning using machine learning. ACM Comput Surv 51 (5):1–42
Bazargan K, Kastner R, Ogrenci S, Sarrafzadeh M (2000) A c to hardware/software compiler. In: Proceedings of the IEEE symposium on field-programmable custom computing machines (FCCM), pp 331–332
Bombieri N, Liu H-Y, Fummi F, Carloni LP (2013) A method to abstract RTL IP blocks into C++ code and enable high-level synthesis. In: Proceedings of the ACM/EDAC/IEEE design automation conference (DAC), pp 1–9
Brewer F, Gajski DD (1990) Chippe: a system for constraint driven behavioral synthesis. IEEE Trans Comput-Aided Des Integr Circuits Syst 9 (7): 681–695
Brisk P, Kaplan A, Sarrafzadeh M (2004) Area-efficient instruction set synthesis for reconfigurable system-on-chip designs. In: Proceedings of the ACM/EDAC/IEEE design automation conference (DAC), pp 395–400
Brisk P, Dabiri F, Jafari R, Sarrafzadeh M (2006) Optimal register sharing for high-level synthesis of SSA form programs. IEEE Trans Comput-Aided Des Integr Circuits Syst 25 (5): 772–779
Buyukkurt B, Cortes J, Villarreal J, Najjar WA (2011) Impact of high-level transformations within the ROCCC framework. ACM Trans Archit Code Optim (TACO) 7 (4):17
Canis A, Choi J, Aldham M, Zhang V, Kammoona A, Czajkowski T, Brown SD, Anderson JH (2013) Legup: an open-source high-level synthesis tool for FPGA-based processor/accelerator systems. ACM Trans Embed Comput Syst (TECS) 13 (2): 24:1–24:27
Canis A, Brown SD, Anderson JH (2014) Modulo SDC scheduling with recurrence minimization in high-level synthesis. In: Proceedings of the IEEE international conference on field programmable logic and applications (FPL), pp 1–8
Carloni LP (2015) From latency-insensitive design to communication-based system-level design. Proc IEEE 103 (11): 2133–2151
Chang E-S, Gajski DD, Narayan S (1996) An optimal clock period selection method based on slack minimization criteria. ACM Trans Des Autom Electron Syst (TODAES) 1 (3): 352–370
Chatarasi P, Neuendorffer S, Bayliss S, Vissers K, Sarkar V (2020) Vyasa: a high-performance vectorizing compiler for tensor convolutions on the Xilinx AI engine
Chen D, Cong J (2004) Register binding and port assignment for multiplexer optimization. In: Proceeding of the Asia and South Pacific design automation conference (ASPDAC), pp 68–73
Chen W, Ray S, Bhadra J, Abadir M, Wang L (2017) Challenges and trends in modern SoC design verification. IEEE Des Test 34 (5): 7–22
Chen J, Zaman M, Makris Y, Blanton RDS, Mitra S, Schafer BC (2020) DECOY: DEflection-Driven HLS-Based Computation Partitioning for Obfuscating Intellectual Property. In: Proceedings of the ACM/IEEE design automation conference (DAC), pp 1–6
Choi J, Brown SD, Anderson JH (2017) From pthreads to multicore hardware systems in legup high-level synthesis for FPGAs. IEEE Trans Very Large Scale Integr (VLSI) Syst 25 (10): 2867–2880
Cilardo A, Flich J, Gagliardi M, Gavila RT (2015) Customizable heterogeneous acceleration for tomorrow’s high-performance computing. In: Proceedings of the IEEE international conference on high performance computing and communications (HPCC), pp 1181–1185
Cong J (2015) High-level synthesis and beyond – from datacenters to IoTs. In: Proceedings of the IEEE international system-on-chip conference (SOCC), pp 1–1
Cong J, Zhang Z (2006) An efficient and versatile scheduling algorithm based on SDC formulation. In: Proceedings of the ACM/EDAC/IEEE design automation conference (DAC), pp 433–438
Cong J, Fan Y, Han G, Jiang W, Zhang Z (2006) Platform-based behavior-level and system-level synthesis. In: Proceedings of the IEEE international SOC conference, pp 199–202
Cong J, Liu B, Neuendorffer S, Noguera J, Vissers K, Zhang Z (2011) High-level synthesis for FPGAs: from prototyping to deployment. IEEE Trans Comput-Aided Des Integr Circuits Syst 30 (4): 473–491
Cong J, Huang M, Pan P, Wang Y, Zhang P (2016) Source-to-source optimization for HLS, pp 137–163
Cota EG, Mantovani P, Guglielmo GD, Carloni LP (2015) An analysis of accelerator coupling in heterogeneous architectures. In: Proceedings of the ACM/EDAC/IEEE design automation conference (DAC)
Coussy P, Chavet C, Bomel P, Heller D, Senn E, Martin E (2008) GAUT: a high-level synthesis tool for DSP applications, pp 147–169
Dai S, Liu G, Zhang Z (2018) A scalable approach to exact resource-constrained scheduling based on a joint SDC and SAT formulation. In: Proceedings of the ACM/SIGDA international symposium on field programmable gate arrays (FPGA), pp 137–146
Dally WJ, Turakhia Y, Han S (2020) Domain-specific hardware accelerators. Commun ACM 63 (7): 48–57. ISSN 0001-0782
De Micheli G (1993) High-level synthesis of digital circuits. In: Advances in computers, vol 37. Elsevier, The Netherlands, pp 207–283
de Fine Licht J, Besta M, Meierhans S, Hoefler T (2021) Transformations of high-level synthesis codes for high-performance computing. IEEE Trans Parallel Distrib Syst 32 (05): 1014–1029
Edwards SA (2006) The challenges of synthesizing hardware from c-like languages. IEEE Des Test 23 (5): 375–386
Edwards SA, Townsend R, Barker M, Kim MA (2019) Compositional dataflow circuits. ACM Trans Embed Comput Syst (TECS) 18 (1):1–27
Ernst D, Kim NS, Das S, Pant S, Rao R, Pham T, Ziesler C, Blaauw D, Austin T, Flautner K, Mudge T (2003) Razor: a low-power pipeline based on circuit-level timing speculation. In: Proceedings of the annual IEEE/ACM international symposium on microarchitecture (MICRO), pp 7–18
Esmaeilzadeh H, Blem E, St. Amant R, Sankaralingam K, Burger D (2012) Dark silicon and the end of multicore scaling. IEEE Micro 32 (3): 122–134
Fezzardi P, Castellana M, Ferrandi F (2015) Trace-based automated logical debugging for high-level synthesis generated circuits. In: Proceedings of the IEEE international conference on computer design (ICCD), pp 251–258
Gajski DD (1984) Silicon compilers and expert systems for VLSI. In: Proceedings of the ACM/EDAC/IEEE design automation conference (DAC), pp 86–87
Galuzzi C, Panainte EM, Yankova Y, Bertels K, Vassiliadis S (2006) Automatic selection of application-specific instruction-set extensions. In: Proceedings of the IEEE/ACM/IFIP international conference on hardware/software codesign and system synthesis (CODES+ISSS), pp 160–165
Genc H, Haj-Ali A, Iyer V, Amid A, Mao H, Wright J, Schmidt C, Zhao J, Ou A, Banister M, Shao YS, Nikolic B, Stoica I, Asanovic K (2019) Gemmini: an agile systolic array generator enabling systematic evaluations of deep-learning architectures. arXiv preprint arXiv:1911.09925
Giri D, Chiu KL, Di Guglielmo G, Mantovani P, Carloni LP (2020) ESP4ML: platform-based design of systems-on-chip for embedded machine learning. In: Proceedings of the ACM/EDAC/IEEE design, automation & test conference in Europe (DATE), pp 1049–1054
Guglielmo GD, Pilato C, Carloni LP (2014) A design methodology for compositional high-level synthesis of communication-centric SoCs. In: Proceedings of the ACM/EDAC/IEEE design automation conference (DAC), pp 1–6
Gupta S, Dutt N, Gupta R, Nicolau A (2003) Spark: a high-level synthesis framework for applying parallelizing compiler transformations. In: Proceedings of the international conference on VLSI design, pp 461–466
Hadjis S, Canis A, Sobue R, Hara-Azumi Y, Tomiyama H, Anderson JH (2015) Profiling-driven multi-cycling in FPGA high-level synthesis. In: Proceedings of the ACM/EDAC/IEEE design, automation & test conference in Europe (DATE), pp 31–36
Hameed F, Khan AA, Castrillon J (2018) Performance and energy-efficient design of STT-RAM last-level cache. IEEE Trans Very Large Scale Integr (VLSI) Syst 26 (6): 1059–1072
Horowitz M (2014) 1.1 computing’s energy problem (and what we can do about it). In: Proceedings of the IEEE international solid-state circuits conference digest of technical papers (ISSCC), pp 10–14
Hsiao H, Anderson JH (2019) Thread weaving: static resource scheduling for multithreaded high-level synthesis. In: Proceedings of the ACM/EDAC/IEEE design automation conference (DAC)
Huang Q, Lian R, Canis A, Choi J, Xi R, Calagar N, Brown SD, Anderson JH (2015) The effect of compiler optimizations on high-level synthesis-generated hardware. ACM Trans Reconfig Technol Syst (TRETS) 8 (3): 14:1–14:26
Jiang Z, Dai S, Suh GE, Zhang Z (2018) High-level synthesis with timing-sensitive information flow enforcement. In: Proceedings of the IEEE/ACM international conference on computer-aided design (ICCAD), pp 1–8
Josipovic L, Brisk P, Ienne P (2017a) From c to elastic circuits. In: Proceedings of the asilomar conference on signals, systems, and computers (ACSSC), pp 121–125
Josipovic L, Brisk P, Ienne P (2017b) An out-of-order load-store queue for spatial computing. In: Proceedings of the IEEE symposium on field-programmable custom computing machines (FCCM), pp 134–134
Josipović L, Ghosal R, Ienne P (2018) Dynamically scheduled high-level synthesis. In: Proceedings of the ACM/SIGDA international symposium on field programmable gate arrays (FPGA), pp 127–136
Klimovic A, Anderson JH (2013) Bitwidth-optimized hardware accelerators with software fallback. In: Proceedings of the IEEE international conference on field-programmable technology (FPT), pp 136–143
Koeplinger D, Feldman M, Prabhakar R, Zhang Y, Hadjis S, Fiszel R, Zhao T, Nardi L, Pedram A, Kozyrakis C, Olukotun K (2018) Spatial: a language and compiler for application accelerators. In: Proceedings of the ACM SIGPLAN conference on programming language design and implementation (PLDI), pp 296–311. ISBN 9781450356985
Kotsifakou M, Srivastava P, Sinclair MD, Komuravelli R, Adve V, Adve S (2018) HPVM: heterogeneous parallel virtual machine. In: Proceedings of the ACM SIGPLAN symposium on principles and practice of parallel programming (PPoPP), pp 68–80
Ku DC, Micheli GD (1991) Constrained resource sharing and conflict resolution in hebe. Integration 12 (2): 131–165
Lattner C, Amini M, Bondhugula U, Cohen A, Davis A, Pienaar J, Riddle R, Shpeisman T, Vasilache N, Zinenko O (2020) Mlir: a compiler infrastructure for the end of Moore’s law
Lattner C, Amini M, Bondhugula U, Cohen A, Davis A, Pienaar J, Riddle R, Shpeisman T, Vasilache N, Zinenko O (2021) MLIR: scaling compiler infrastructure for domain specific computation. In: Proceedings of the IEEE/ACM international symposium on code generation and optimization (CGO), pp 2–14
Lattuada M, Ferrandi F (2015) Code transformations based on speculative SDC scheduling. In: Proceedings of the IEEE/ACM international conference on computer-aided design (ICCAD), pp 71–77
Lattuada M, Ferrandi F (2019) A design flow engine for the support of customized dynamic high level synthesis flows. ACM Trans Reconfig Technol Syst (TRETS) 12 (4):1–26
Liu J, Cong J (2019) Dataflow systolic array implementations of matrix decomposition using high level synthesis. In: Proceedings of the ACM/SIGDA international symposium on field programmable gate arrays (FPGA), p 187
Makrani HM, Sayadi H, Mohsenin T, Rafatirad S, Sasan A, Homayoun H (2019) XPPE: cross-platform performance estimation of hardware accelerators using machine learning. In: Proceedings of the 24th Asia and South Pacific design automation conference (ASPDAC)
Mantovani P, Cota EG, Pilato C, Guglielmo GD, Carloni LP (2016a) Handling large data sets for high-performance embedded applications in heterogeneous systems-on-chip. In: Proceedings of the international conference on compliers, architectures, and sythesis of embedded systems (CASES), pp 3:1–3:10
Mantovani P, Cota EG, Tien K, Pilato C, Guglielmo GD, Shepard K, Carloni LP (2016b) An FPGA-based infrastructure for fine-grained DVFS analysis in high-performance embedded systems. In: Proceedings of the ACM/EDAC/IEEE design automation conference (DAC)
Mantovani P, Giri D, Di Guglielmo G, Piccolboni L, Zuckerman J, Cota EG, Petracca M, Pilato C, Carloni LP (2020) Agile SoC development with open ESP. In: Proceedings of the IEEE/ACM international conference on computer-aided design (ICCAD), pp 1–6
Martin G, Smith G (2009) High-level synthesis: past, present, and future. IEEE Des Test Comput 26 (4): 18–25
Minutoli M, Castellana VG, Tumeo A, Ferrandi F (2015) Inter-procedural resource sharing in high level synthesis through function proxies. In: Proceedings of the IEEE international conference on field programmable logic and applications (FPL), pp 1–8
Minutoli M, Castellana VG, Tumeo A, Lattuada M, Ferrandi F (2016) Efficient synthesis of graph methods: a dynamically scheduled architecture. In: Proceedings of the IEEE/ACM international conference on computer-aided design (ICCAD)
Nane R, Sima VM, Pilato C, Choi J, Fort B, Canis A, Chen YT, Hsiao H, Brown S, Ferrandi F, Anderson J, Bertels K (2016) A survey and evaluation of FPGA high-level synthesis tools. IEEE Trans Comput-Aided Des Integr Circuits Syst 35 (10): 1591–1604
Ndu G (2012) Boosting single thread performance in mobile processors using reconfigurable acceleration. PhD thesis, 10
Pilato C (2017) Bridging the gap between software and hardware designers using high-level synthesis. In: Proceedings of the international conference on parallel computing (PARCO), pp 622–631
Pilato C, Ferrandi F (2013) Bambu: a modular framework for the high level synthesis of memory-intensive applications. In: Proceedings of the IEEE international conference on field programmable logic and applications (FPL), pp 1–4
Pilato C, Tumeo A, Palermo G, Ferrandi F, Lanzi PL, Sciuto D (2008) Improving evolutionary exploration to area-time optimization of FPGA designs. J Syst Archit Embed Syst Des 54 (11): 1046–1057
Pilato C, Castellana VG, Lovergine S, Ferrandi F (2011a) A runtime adaptive controller for supporting hardware components with variable latency. In: Proceedings of the NASA/ESA conference on adaptive hardware and systems (AHS), pp 153–160
Pilato C, Ferrandi F, Sciuto D (2011b) A design methodology to implement memory accesses in high-level synthesis. In: Proceedings of the IEEE/ACM/IFIP international conference on hardware/software codesign and system synthesis (CODES+ISSS), pp 49–58
Pilato C, Mantovani P, Guglielmo GD, Carloni LP (2017) System-level optimization of accelerator local memory for heterogeneous systems-on-chip. IEEE Trans Comput-Aided Des Integr Circuits Syst 36 (3): 435–448
Pilato C, Garg S, Wu K, Karri R, Regazzoni F (2018a) Securing hardware accelerators: a new challenge for high-level synthesis. IEEE Embed Syst Lett 10 (3): 77–80
Pilato C, Basu K, Shayan M, Regazzoni F, Karri R (2018b) High-level synthesis of benevolent trojans. In: Proceedings of the ACM/EDAC/IEEE design, automation & test conference in Europe (DATE), pp 1124–1129
Pilato C, Regazzoni F, Karri R, Garg S (2018c) TAO: techniques for algorithm-level obfuscation during high-level synthesis. In: Proceedings of the ACM/EDAC/IEEE design automation conference (DAC), pp 1–6
Pilato C, Wu K, Garg S, Karri R, Regazzoni F (2019) TaintHLS: high-level synthesis for dynamic information flow tracking. IEEE Trans Comput-Aided Des Integr Circuits Syst 38 (5): 798–808
Pilato C, Bohm S, Brocheton F, Castrillon J, Cevasco R, Cima V, Cmar R, Diamantopoulos D, Ferrandi F, Martinovic J, Palermo G, Paolino M, Parodi A, Pittaluga L, Raho D, Regazzoni F, Slaninova K, Hagleitner C (2021) EVEREST: a design environment for extreme-scale big data analytics on heterogeneous platforms. In: Proceedings of the design, automation, and test in Europe conference and exhibition (DATE)
Pothineni N, Brisk P, Ienne P, Kumar A, Paul K (2010) A high-level synthesis flow for custom instruction set extensions for application-specific processors. In: Proceedings of the IEEE Asian and South Pacific design automation conference (ASP-DAC), pp 707–712
Pu J, Bell S, Yang X, Setter J, Richardson S, Ragan-Kelley J, Horowitz M (2017) Programming heterogeneous systems from an image processing DSL. ACM Trans Archit Code Optim 14 (3):1–25
Ragan-Kelley J, Barnes C, Adams A, Paris S, Durand F, Amarasinghe S (2013) Halide: a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines. In: Proceedings of the ACM SIGPLAN conference on programming language design and implementation (PLDI), pp 519–530. ISBN 9781450320146
Ranjan Panda P, Dutt ND, Nicolau A (1998) Incorporating dram access modes into high-level synthesis. IEEE Trans Comput-Aided Des Integr Circuits Syst 17 (2): 96–109
Stok L (1994) Data path synthesis. Integration 18 (1): 1–71
Venkatesan R, Shao YS, Wang M, Clemons J, Dai S, Fojtik M, Keller B, Klinefelter A, Pinckney N, Raina P, Zhang Y, Zimmer B, Dally WJ, Emer J, Keckler SW, Khailany B (2019) Magnet: a modular accelerator generator for neural networks. In: Proceedings of the IEEE/ACM international conference on computer-aided design (ICCAD), pp 1–8
Weerasinghe J, Polig R, Abel F, Hagleitner C (2016) Network-attached FPGAs for data center applications. In: Proceedings of the international conference on field-programmable technology (FPT), pp 36–43
Windh S, Ma X, Halstead RJ, Budhkar P, Luna Z, Hussaini O, Najjar WA (2015) High-level language tools for reconfigurable computing. Proc IEEE 103 (3): 390–408
Zhu J, Gajski DD (1999) A unified formal model of ISA and FSMD. In: Proceedings of the seventh international workshop on hardware/software codesign (CODES), pp 121–125
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Section Editor information
Rights and permissions
Copyright information
© 2022 Springer Nature Singapore Pte Ltd.
About this entry
Cite this entry
Pilato, C., Soldavini, S. (2022). Accelerator Design with High-Level Synthesis. In: Chattopadhyay, A. (eds) Handbook of Computer Architecture. Springer, Singapore. https://doi.org/10.1007/978-981-15-6401-7_19-1
Download citation
DOI: https://doi.org/10.1007/978-981-15-6401-7_19-1
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-6401-7
Online ISBN: 978-981-15-6401-7
eBook Packages: Springer Reference EngineeringReference Module Computer Science and Engineering