Big Data and HPC Acceleration with Vivado HLS

  • Moritz Schmid
  • Christian Schmitt
  • Frank Hannig
  • Gorker Alp Malazgirt
  • Nehir Sonmez
  • Arda Yurdakul
  • Adrian Cristal
Chapter

Abstract

Recent years have seen a new generation of HLS tools, which do not only allow to generate hardware architectures from hardware behavioral models, but perform synthesis starting from algorithms specified in HLLs. One of the reasons for this development is the ever growing popularity of reconfigurable logic, which aims at providing the performance and energy efficiency of integrated circuits at a flexibility that is very close to software.

References

  1. [AHS14]
    M. Abdelfattah, A. Hagiescu, D. Singh, Gzip on a chip: high performance lossless data compression on FPGAs using OpenCL, in Proceedings of the International Workshop on OpenCL (IWOCL), May 2014, pp. 4:1–4:9Google Scholar
  2. [AK08]
    S. Aditya, V. Kathail, Algorithmic Synthesis Using PICO: An Integrated Framework for Application Engine Synthesis and Verification from High Level C Algorithms, Chap. 4, pp. 53–74; in Coussy, Morawiec [CM08], 1st edn. (2008)Google Scholar
  3. [AFP+11]
    M. Adler, K. Fleming, A. Parashar, M. Pellauer, J. Emer, LEAP scratchpads: automatic memory and cache management for reconfigurable logic, in Proceedings of the 19th ACM/SIGDA International Symposium on Field Programmable Gate Arrays (FPGA), February 2011 (ACM, New York, 2011), pp. 25–28Google Scholar
  4. [APL11]
    A. Agne, M. Platzner, E. Lübbers, Memory virtualization for multithreaded reconfigurable hardware, in Proceedings of the International Conference on Field Programmable Logic and Applications (FPL), September 2011 (IEEE Computer Society, Los Alamitos, 2011), pp. 185–188Google Scholar
  5. [AHL+14a]
    A. Agne, M. Happe, A. Lösch, C. Plessl, M. Platzner, Self-awareness as a model for designing and operating heterogeneous multicores. ACM Trans. Reconfigurable Technol. Syst. 7(2), 13:1–13:18 (2014)Google Scholar
  6. [AHL+14b]
    A. Agne, M. Happe, E. Lübbers, B. Plattner, M. Platzner, C. Plessl, ReconOS – an operating system approach for reconfigurable computing. IEEE Micro 34(1), 60–71 (2014)CrossRefGoogle Scholar
  7. [Agr09]
    J. Agron, Domain-specific language for HW/SW co-design for FPGAs, in Proceedings of the IFIP TC 2 Working Conference on Domain-Specific Languages, July 2009. Lecture Notes in Computer Science (LNCS), vol. 5658 (Springer, Berlin, 2009), pp. 262–284Google Scholar
  8. [AHA14]
    AHA Products Group, AHA3642 (2014), http://www.aha.com/DrawProducts.aspx?Action=GetProductDetails&ProductID=38. Accessed 4 Aug 2015 [Online]
  9. [ALSU06]
    A. Aho, M. Lam, R. Sethi, J. Ullman, Compilers: Principles, Techniques, and Tools, 2nd edn. (Addison-Wesley Longman Publishing Co., Inc., Boston, 2006)MATHGoogle Scholar
  10. [AABC11]
    M. Aldham, J. Anderson, S. Brown, A. Canis, Low-cost hardware profiling of run-time and energy in FPGA embedded processors, in Proceedings of the 22nd IEEE International Conference on Application-Specific Systems, Architectures and Processors (ASAP), September 2011 (IEEE Computer Society, Los Alamitos, 2011), pp. 61–68Google Scholar
  11. [Alt10a]
    Altera Corp., San Jose, CA, DE4 Development Board (2010)Google Scholar
  12. [Alt10b]
    Altera Corp., San Jose, CA, Stratix-IV Data Sheet (2010)Google Scholar
  13. [Alt13a]
    Altera Corp., Altera SoCs: When Architecture Matters (2013), http://www.altera.com/devices/processor/soc-fpga/overview/proc-soc-fpga.html. Accessed 4 Aug 2015 [Online]
  14. [Alt14a]
    Altera Corp., San Jose, CA, Altera SoC Embedded Design Suite User Guide (ug-1137) (2014)Google Scholar
  15. [Alt14b]
    Altera Corp., San Jose, CA, Comparing Altera SoC Device Family Features (2014)UF-1005Google Scholar
  16. [Alt15a]
    Altera Corp., San Jose, CA, Nios II Classic Software Developer’s Handbook (NII5V2) (2015)Google Scholar
  17. [Alt15b]
    Altera Corp., San Jose, CA, Nios II Core Implementation Details (NII51016) (2015)Google Scholar
  18. [Alt15c]
    Altera Corp., San Jose, CA, Quartus II Handbook Volume 1: Design and Synthesis (QII5V1) (2015)Google Scholar
  19. [Alt13b]
    Altium Limited, C-to-Hardware Compiler User Manual (GU0122) (2013)Google Scholar
  20. [AK98]
    H. Andrade, S. Kovner, Software synthesis from dataflow models for G and LabVIEW, in Proceedings of the IEEE Asilomar Conference on Signals, Systems, and Computers, November 1998, pp. 1705–1709Google Scholar
  21. [ANJ+04]
    D. Andrews, D. Niehaus, R. Jidin, M. Finley, W. Peck, M. Frisbie, J. Ortiz, E. Komp, P. Ashenden, Programming models for hybrid FPGA-CPU computational components: a missing link. IEEE Micro 24(4), 42–53 (2004)CrossRefGoogle Scholar
  22. [ASA+08]
    D. Andrews, R. Sass, E. Anderson, J. Agron, W. Peck, J. Stevens, F. Baijot, E. Komp, Achieving programming model abstractions for reconfigurable computing. IEEE Trans. Very Large Scale Integr. VLSI Syst. 16(1), 34–44 (2008)CrossRefGoogle Scholar
  23. [ANS+14]
    O. Arcas-Abella, G. Ndu, N. Sonmez, M. Ghasempour, A. Armejach, J. Navaridas, W. Song, J. Mawer, A. Cristal, M. Luján, An empirical evaluation of high-level synthesis languages and tools for database acceleration, in Proceedings of the 24th International Conference on Field Programmable Logic and Applications (FPL), September 2014, pp. 1–8Google Scholar
  24. [ALJ15]
    J. Arram, W. Luk, P. Jiang, Ramethy: reconfigurable acceleration of bisulfite sequence alignment, in Proceedings of the 23rd ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA), February 2015 (ACM, New York, 2015), pp. 250–259Google Scholar
  25. [BTA93]
    J. Babb, R. Tessier, A. Agarwal, Virtual wires: overcoming pin limitations in FPGA-based logic emulators, in Proceedings of the IEEE Workshop on FPGAs for Custom Computing Machines (FCCM), April 1993, pp. 142–151Google Scholar
  26. [BTD+97]
    J. Babb, R. Tessier, M. Dahl, S. Hanono, D. Hoki, A. Agarwal, Logic emulation with virtual wires. IEEE Trans. Comput. Aided Des. 16(6), 609–626 (1997)CrossRefGoogle Scholar
  27. [BVR+12]
    J. Bachrach, H. Vo, B. Richards, Y. Lee, A. Waterman, R. Avizienis, J. Wawrzynek, K. Asanovic, Chisel: constructing hardware in a Scala embedded language, in Proceedings of the 49th ACM/EDAC/IEEE Design Automation Conference (DAC), June 2012, pp. 1212–1221Google Scholar
  28. [Bar76]
    M. Barbacci, The symbolic manipulation of computer descriptions: ISPL compiler and simulator, Technical report, Department of Computer Science, Carnegie Mellon University, Pittsburgh, 1976Google Scholar
  29. [BS73]
    M. Barbacci, D. Siewiorek, Automated exploration of the design space for register transfer (RT) systems.ACM SIGARCH Comput. Archit. News 2(4), 101–106 (1973)Google Scholar
  30. [Bas04]
    C. Bastoul, Code generation in the polyhedral model is easier than you think, in Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques (PACT) (IEEE Computer Society, Washington, DC, 2004), pp. 7–16Google Scholar
  31. [Bat68]
    K. Batcher, Sorting networks and their applications, in Proceedings of the Spring Joint Computer Conference (AFIPS) (ACM, New York, 1968) pp. 307–314Google Scholar
  32. [BKS00]
    K. Bazargan, R. Kastner, M. Sarrafzadeh, Fast template placement for reconfigurable computing systems, IEEE Des. Test Comput. 17(1), 68–83 (2000)CrossRefGoogle Scholar
  33. [BBZT14]
    A. Becher, F. Bauer, D. Ziener, J. Teich, Energy-aware SQL query acceleration through FPGA-based dynamic partial reconfiguration, in Proceedings of the 24th International Conference on Field Programmable Logic and Applications (FPL) (IEEE, New York, 2014), pp. 1–8Google Scholar
  34. [BKT11]
    C. Beckhoff, D. Koch, J. Torresen, The Xilinx design language (XDL): tutorial and use cases, in Proceedings of the 6th International Workshop on Reconfigurable Communication-centric Systems-on-Chip (ReCoSoC), June 2011 (IEEE, New York, 2011) pp. 1–8Google Scholar
  35. [BWHC06]
    N. Bergmann, J. Williams, J. Han, Y. Chen, A process model for hardware modules in reconfigurable system-on-chip, in Workshop Proceedings of the International Conference on Architecture of Computing Systems (ARCS), March 2006. Lecture Notes in Informatics (LNI), vol. 81 (Gesellschaft für Informatik (GI), Bonn, 2006), pp. 205–214Google Scholar
  36. [BRM99]
    V. Betz, J. Rose, A. Marquardt, Architecture and CAD for Deep-Submicron FPGAs (Kluwer Academic, Norwell, 1999)CrossRefGoogle Scholar
  37. [Bis06]
    C. Bishop, Pattern Recognition and Machine Learning (Information Science and Statistics) (Springer, New York, Secaucus, 2006)MATHGoogle Scholar
  38. [Blu04]
    Bluespec Inc., Bluespec SystemVerilog Version 3.8 Reference Guide (2004)Google Scholar
  39. [Bob07]
    C. Bobda, Introduction to Reconfigurable Computing: Architectures, Algorithms, and Applications (Springer, Netherlands, 2007)MATHCrossRefGoogle Scholar
  40. [Bol08]
    T. Bollaert, Catapult Synthesis: A Practical Introduction to Interactive C Synthesis, Chap. 3, pp. 29–52; in Coussy, Morawiec [CM08], 1st edn. (2008)Google Scholar
  41. [BL12]
    A. Brant, G. Lemieux, ZUMA: an open FPGA overlay architecture, in Proceedings of the IEEE 20th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM) (2012), pp. 93–96Google Scholar
  42. [Bre96]
    G. Brebner, A virtual hardware operating system for the Xilinx XC6200, in Proceedings of the International Workshop Fiel-Programmable Logic and Applications (FPL) (1996), pp. 327–336Google Scholar
  43. [BR96]
    S. Brown, J. Rose, FPGA and CPLD architectures: a tutorial. IEEE Des. Test Comput. 13(2), 42–57 (1996)CrossRefGoogle Scholar
  44. [BCVN10]
    B. Buyukkurt, J. Cortes, J. Villarreal, W. Najjar, Impact of high-level transformations within the ROCCC framework. ACM Trans. Archit. Code Optim. 7(4), 17:1–17:36 (2010)Google Scholar
  45. [CBA14a]
    N. Calagar, S. Brown, J. Anderson, Source-level debugging for FPGA high-level synthesis, in Proceedings of the 24th International Conference on Field Programmable Logic and Applications (FPL), September 2014 (IEEE, New York, 2014), pp. 1–8Google Scholar
  46. [CW91]
    R. Camposano, W. Wolf (eds.), High-Level VLSI Synthesis (Kluwer Academic, Norwell, 1991)MATHGoogle Scholar
  47. [CAB13]
    A. Canis, J. Anderson, S. Brown, Multi-pumping for resource reduction in FPGA high-level synthesis, in Proceedings of the Conference on Design, Automation and Test in Europe (DATE) (2013), pp. 194–197Google Scholar
  48. [CCA+13]
    A. Canis, J. Choi, M. Aldham, V. Zhang, A. Kammoona, T. Czajkowski, S. Brown, J. Anderson, LegUp: an open-source high-level synthesis tool for FPGA-based processor/accelerator systems. ACM Trans. Embed. Comput. Syst. 13(2), 24:1–24:27 (2013)Google Scholar
  49. [CBA14b]
    A. Canis, S. Brown, J. Anderson, Modulo SDC scheduling with recurrence minimization in high-level synthesis, in Proceedings of the 24th International Conference on Field Programmable Logic and Applications (FPL), September 2014 (IEEE, New York, 2014), pp. 1–8Google Scholar
  50. [Can15]
    Canterbury Corpus, Descriptions of the Corpora (2015), http://corpus.canterbury.ac.nz/descriptions/. Accessed 4 Aug 2015 [Online]
  51. [CA13]
    D. Capalija, T. Abdelrahman, A high-performance overlay architecture for pipelined execution of data flow graphs, in Proceedings of the 23rd International Conference on Field Programmable Logic and Applications (FPL), September 2013, pp. 1–8Google Scholar
  52. [CDW10]
    J. Cardoso, P. Diniz, M. Weinhardt, Compiling for reconfigurable computing: a survey. ACM Comput. Surv. 42(4), 13:1–13:65 (2010)Google Scholar
  53. [CMS01]
    L. Carloni, K. McMillan, A. Sangiovanni-Vincentelli, Theory of latency-insensitive design. IEEE Trans. Comput. Aided Des. 20(9), 1059–1076 (2001)CrossRefGoogle Scholar
  54. [CO14]
    J. Casper, K. Olukotun, Hardware acceleration of database operations, in Proceedings of the ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA) (2014), pp. 151–160Google Scholar
  55. [CCP06]
    D. Chen, J. Cong, P. Pan, FPGA design automation: a survey. Found. Trend. Electron. Des. Autom. 1(3), 139–169 (2006)MATHGoogle Scholar
  56. [CBA13]
    J. Choi, S. Brown, J. Anderson, From software threads to parallel hardware in high-level synthesis for FPGAs, in Proceedings of the IEEE International Conference on Field-Programmable Technology (FPT) (2013), pp. 270–277Google Scholar
  57. [CHM11]
    E. Chung, J. Hoe, K. Mai, CoRAM: an in-fabric memory abstraction for FPGA-based computing, in Proceedings of the 19th ACM/SIGDA International Symposium on Field Programmable Gate Arrays (FPGA) (ACM, New York, 2011), pp. 97–106CrossRefGoogle Scholar
  58. [CG15]
    A. Cilardo, L. Gallo, Improving multibank memory access parallelism with lattice-based partitioning. ACM Trans. Archit. Code Optim. 11(4), 45:1–45:25 (2015)Google Scholar
  59. [CH02]
    K. Compton, S. Hauck, Reconfigurable computing: a survey of systems and software. ACM Comput. Surv. 34(2), 171–210 (2002)CrossRefGoogle Scholar
  60. [CCKH00]
    K. Compton, J. Cooley, S. Knol, S. Hauck, Configuration relocation and defragmentation for reconfigurable computing, in Proceedings of the International Symposium on Field-Programmable Custom Computing Machines (FCCM) (2000), pp. 279–280Google Scholar
  61. [CD94]
    J. Cong, Y. Ding, FlowMap: an optimal technology mapping algorithm for delay optimization in lookup-table based FPGA designs. IEEE Trans. Comput. Aided Des. 13(1), 1–12 (1994)CrossRefGoogle Scholar
  62. [CZ06]
    J. Cong, Z. Zhang, An efficient and versatile scheduling algorithm based on SDC formulation, in Proceedings of the IEEE/ACM Design Automation Conference (DAC) (2006), pp. 433–438Google Scholar
  63. [CJLZ09]
    J. Cong, W. Jiang, B. Liu, Y. Zou, Automatic memory partitioning and scheduling for throughput and power optimization, in Proceedings of the International Conference on Computer-Aided Design (ICCAD) (ACM, New York, 2009), pp. 697–704Google Scholar
  64. [CJLZ11]
    J. Cong, W. Jiang, B. Liu, Y. Zou, Automatic memory partitioning and scheduling for throughput and power optimization. ACM Trans. Des. Autom. Electron. Syst. 16(2), 15:1–15:25 (2011)Google Scholar
  65. [CLN+11]
    J. Cong, B. Liu, S. Neuendorffer, J. Noguera, K. Vissers, Z. Zhang, High-level synthesis for FPGAs: from prototyping to deployment. IEEE Trans. Comput.-Aided Des. 30(4), 473–491 (2011)CrossRefGoogle Scholar
  66. [CZZ12]
    J. Cong, P. Zhang, Y. Zou, Optimizing memory hierarchy allocation with loop transformations for high-level synthesis, in Proceedings of the 49th Annual Design Automation Conference (DAC) (ACM, New York, 2012), pp. 1233–1238Google Scholar
  67. [CHL+12]
    J. Cong, M. Huang, B. Liu, P. Zhang, Y. Zou, Combining module selection and replication for throughput-driven streaming programs, in Proceedings of the Conference on Design, Automation and Test in Europe (DATE) (EDA Consortium, San Jose, 2012), pp. 1018–1023Google Scholar
  68. [CHZ14]
    J. Cong, M. Huang, P. Zhang, Combining computation and communication optimizations in system synthesis for streaming applications, in Proceedings of the ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA) (ACM, New York, 2014), pp. 213–222Google Scholar
  69. [CS10]
    J. Coole, G. Stitt, Intermediate fabrics: virtual architectures for circuit portability and fast placement and routing, in Proceedings of IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS), October 2010, pp. 13–22Google Scholar
  70. [CS15]
    J. Coole, G. Stitt, Adjustable-cost overlays for runtime compilation, in Proceedings of the IEEE 23rd Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM), May 2015, pp. 21–24Google Scholar
  71. [Con15]
    Convey Computers (2015), http://www.conveycomputer.com/. Accessed 2 April 2015 [Online]
  72. [CM08]
    P. Coussy, A. Morawiec (eds.), High-Level Synthesis: From Algorithm to Digital Circuit, 1st edn. (Springer, New York, 2008)Google Scholar
  73. [CCB+08]
    P. Coussy, C. Chavet, P. Bomel, D. Heller, E. Senn, E. Martin, GAUT: A High-Level Synthesis Tool for DSP Applications, Chap. 9, pp. 147–169; in Coussy, Morawiec [CM08], 1st edn. (2008)Google Scholar
  74. [Cra98]
    D. Craft, A fast hardware data compression algorithm and some algorithmic extensions. IBM J. Res. Dev. 42(6), 733–746 (1998)CrossRefGoogle Scholar
  75. [DMC+06]
    A. DeHon, Y. Markovsky, E. Caspi, M. Chu, R. Huang, S. Perissakis, L. Pozzi, J. Yeh, J. Wawrzynek, Stream computations organized for reconfigurable execution. Microprocess. Microsyst. 30, 334–354 (2006)CrossRefGoogle Scholar
  76. [DGY+74]
    R. Dennard, F. Gaensslen, H.-N. Yu, V. Rideout, E. Bassous, A. LeBlanc, Design of ion-implanted MOSFET’s with very small physical dimensions. IEEE J. Solid-State Circuits 9(5), 256–268 (1974)CrossRefGoogle Scholar
  77. [DBG+14]
    P. Dlugosch, D. Brown, P. Glendenning, M. Leventhal, H. Noyes, An efficient and scalable semiconductor architecture for parallel automata processing. IEEE Trans. Parallel Distrib. Syst. 25(12), 3088–3098 (2014)CrossRefGoogle Scholar
  78. [DLYS13]
    Z. Du, X. Li, X. Yang, K. Shen, A parallel multigrid poisson PDE solver for Gigapixel image editing, in High Performance Computing. Communications in Computer and Information Science, vol. 207 (Springer, Berlin, 2013)Google Scholar
  79. [Edw06]
    S. Edwards. The challenges of synthesizing hardware from C-like languages. IEEE Des. Test Comput. 23(5), 375–386 (2006)CrossRefGoogle Scholar
  80. [ESK07]
    M. El Ghany, A. Salama, A. Khalil, Design and implementation of FPGA-based systolic array for LZ data compression, in Proceedings of the IEEE International Symposium on Circuits and Systems (ISCAS), May 2007, pp. 3691–3695Google Scholar
  81. [Exa13]
    Exar, GX 1700 Series (2013), http://www.exar.com/common/content/document.ashx?id=21282&languageid=1033. Accessed 4 Aug 2015 [Online]
  82. [Fea92]
    P. Feautrier, Some efficient solutions to the affine scheduling problem. Part II. Multidimensional time. Int. J. Parallel Prog. 21(6), 389–420 (1992)MathSciNetMATHCrossRefGoogle Scholar
  83. [FVLN15]
    E. Fernandez, J. Villarreal, S. Lonardi, W. Najjar, FHAST: FPGA-based acceleration of BOWTIE in hardware. IEEE/ACM Trans. Comput. Biol. Bioinform. 12(5), 973–981 (2015)CrossRefGoogle Scholar
  84. [FVM+11]
    R. Ferreira, J. Vendramini, L. Mucida, M. Pereira, L. Carro, An FPGA-based heterogeneous coarse-grained dynamically reconfigurable architecture, in Proceedings of the 14th International Conference on Compilers, Architectures and Synthesis for Embedded Systems (CASES) (ACM, New York, 2011), pp. 195–204Google Scholar
  85. [Fin10]
    M. Fingeroff, High-Level Synthesis Blue Book (Xlibris Corporation, Bloomington, 2010)Google Scholar
  86. [FAP+12]
    K. Fleming, M. Adler, M. Pellauer, A. Parashar, Arvind, J. Emer, Leveraging latency-insensitivity to ease multiple FPGA design, in Proceedings of the ACM/SIGDA international symposium on Field Programmable Gate Arrays (FPGA) (2012), pp. 175–184Google Scholar
  87. [FPM12]
    M. Flynn, O. Pell, O. Mencer, Dataflow supercomputing, in Proceedings of the 22nd International Conference on Field Programmable Logic and Applications (FPL) (2012), pp. 1–3Google Scholar
  88. [FVM+08]
    F. Franchetti, Y. Voronenko, P. Milder, S. Chellappa, M. Telgarsky, H. Shen, P. D’Alberto, F. de Mesmay, J. Hoe, J. Moura, M. Püschel, Domain-specific library generation for parallel software and hardware platforms, in IEEE International Symposium on Parallel and Distributed Processing (IPDPS), April 2008, pp. 1–5Google Scholar
  89. [FRC90]
    R. Francis, J. Rose, K. Chung, Chortle: a technology mapping program for lookup table-based field programmable gate arrays, in Proceedings of the 27th ACM/IEEE Design Automation Conference (DAC) (ACM, New York, 1990), pp. 613–619Google Scholar
  90. [Fre96]
    M. Freidlin, Diffusion processes and PDE’s in narrow branching tubes, in Markov Processes and Differential Equations. Lectures in Mathematics ETH Zurich (Birkhäuser, Basel, 1996), pp. 79–89Google Scholar
  91. [FC05]
    W. Fu, K. Compton, An execution environment for reconfigurable computing, in Proceedings of the 13th Annual IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM), April 2005, pp. 149–158Google Scholar
  92. [GDWL92]
    D. Gajski, N. Dutt, A. Wu, S. Lin, High-level Synthesis: Introduction to Chip and System Design (Kluwer Academic, Norwell, 1992)CrossRefGoogle Scholar
  93. [GFY+14]
    L. Gan, H. Fu, C. Yang, W. Luk, W. Xue, O. Mencer, X. Huang, G. Yang, A highly-efficient and green data flow engine for solving Euler atmospheric equations, in Proceedings of the 24th International Conference on Field Programmable Logic and Applications (FPL), September 2014 (IEEE, New York, 2014), pp. 1–6Google Scholar
  94. [GC07]
    P. Garcia, K. Compton, A reconfigurable hardware interface for a modern computing system, in Proceedings of the International Symposium on Field-Programmable Custom Computing Machines (FCCM), April 2007 (IEEE Computer Society, Los Alamitos, 2007), pp. 73–84Google Scholar
  95. [GW14]
    J. Goeders, S. Wilton, Effective FPGA debug for high-level synthesis generated circuits, in Proceedings of the International Conference on Field Programmable Logic and Applications (FPL), September 2014 (IEEE, New York, 2014), pp. 1–8CrossRefGoogle Scholar
  96. [GW15]
    J. Goeders, S. Wilton, Using dynamic signal-tracing to debug compiler-optimized HLS circuits on FPGAs, in Proceedings of the IEEE International Symposium on Field-Programmable Custom Computing Machines (FCCM), May 2015, pp. 127–134Google Scholar
  97. [GGF+11]
    V. Gopal, J. Guilford, W. Feghali, E. Ozturk, G. Wolrich, High Performance DEFLATE on Intel Architecture Processors (2011), http://www.intel.com/content/dam/www/public/us/en/documents/white-papers/ia-deflate-compression-paper.pdf. Accessed 4 Aug 2015 [Online]
  98. [GA13]
    M. Gort, J. Anderson, Range and bitmask analysis for hardware optimization in high-level synthesis, in Proceedings of the IEEE/ACM Asia and South Pacific Design Automation Conference (ASP-DAC) (2013), pp. 773–779Google Scholar
  99. [GHSV+11]
    N. Goulding-Hotta, J. Sampson, G. Venkatesh, S. Garcia, J. Auricchio, P. Huang, M. Arora, S. Nath, V. Bhatt, J. Babb, S. Swanson, M. Taylor, The GreenDroid mobile application processor: an architecture for silicon’s dark future. IEEE Micro 31(2), 86–95 (2011)CrossRefGoogle Scholar
  100. [GWL11]
    D. Grant, C. Wang, G. Lemieux, A CAD framework for Malibu: An FPGA with time-multiplexed coarse-grained elements, in Proceedings of the 19th ACM/SIGDA International Symposium on Field Programmable Gate Arrays (FPGA) (ACM, New York, 2011), pp. 123–132CrossRefGoogle Scholar
  101. [GBN04]
    Z. Guo, B. Buyukkurt, W. Najjar, Input data reuse in compiling window operations onto reconfigurable hardware, in Proceedings of the ACM SIGPLAN/SIGBED Conference on Languages, Compilers and Tools for Embedded Systems, June 2004 (ACM, New York, 2004), pp. 249–256Google Scholar
  102. [GBC+08]
    Z. Guo, A. Buyukkurt, J. Cortes, A. Mitra, W. Najjar, A compiler intermediate representation for reconfigurable fabrics. Int. J. Parallel Prog. 36(5), 493–520 (2008)MATHCrossRefGoogle Scholar
  103. [GNB08]
    Z. Guo, W. Najjar, B. Buyukkurt, Efficient hardware code generation for FPGAs. ACM Trans. Archit. Code Optim. 5(1), 6:1–6:26 (2008)Google Scholar
  104. [GB08]
    R. Gupta, F. Brewer, High-Level Synthesis: A Retrospective, Chap. 2, pp. 13–28; in Coussy, Morawiec [CM08], 1st edn. (2008)Google Scholar
  105. [GGDN04a]
    S. Gupta, R. Gupta, N. Dutt, A. Nicolau, SPARK: A Parallelizing Approach to the High-Level Synthesis of Digital Circuits (Kluwer Academic, Norwell, 2004)Google Scholar
  106. [GGDN04b]
    S. Gupta, R. K. Gupta, N. D. Dutt, A. Nicolau, Coordinated parallelizing compiler optimizations and high-level synthesis. ACM Trans. Des. Autom. Electron. Syst. 9(4), 441–470 (2004)CrossRefGoogle Scholar
  107. [GL91]
    B. Gustafsson, P. Lötstedt, Analysis of multigrid methods for general systems of PDE, in Multigrid Methods III. International Series of Numerical Mathematics, vol. 98 (Birkhäuser Basel, 1991), pp. 223–234Google Scholar
  108. [HCS+15]
    S. Hadjis, A. Canis, R. Sobue, Y. Hara-Azumi, H. Tomiyama, J. Anderson, Profiling-driven multi-cycling in FPGA high-level synthesis, in Proceedings of the Conference on Design, Automation and Test in Europe (DATE) (2015), pp. 31–36.Google Scholar
  109. [HWBR09]
    A. Hagiescu, W.-F. Wong, D. Bacon, R. Rabbah, A computing origami: Folding streams in FPGAs, in Proceedings of the 46th ACM/IEEE Design Automation Conference (DAC) (IEEE, New York, 2009), pp. 282–287Google Scholar
  110. [HVN14]
    R. Halstead, J. Villarreal, W. Najjar, Compiling irregular applications for reconfigurable systems. Int. J. High Perform. Comput. Netw. 7(4), 258–268 (2014)CrossRefGoogle Scholar
  111. [HANT15]
    R. Halstead, I. Absalyamov, W. Najjar, V. Tsotras, FPGA-based multithreading for in-memory hash joins, in Online Proceedings of the Seventh Biennial Conference on Innovative Data Systems Research (CIDR) (2015), pp. 1–9Google Scholar
  112. [HIS14]
    B. Hamilton, M. Inggs, H. So, Mixed-architecture process scheduling on tightly coupled reconfigurable computers, in Proceedings of the 24th International Conference on Field Programmable Logic and Applications (FPL), September 2014, pp. 1–4Google Scholar
  113. [HRDT08]
    F. Hannig, H. Ruckdeschel, H. Dutta, J. Teich, PARO: synthesis of hardware accelerators for multi-dimensional dataflow-intensive applications, in Proceedings of the Fourth International Workshop on Applied Reconfigurable Computing (ARC), London, United Kingdom. Lecture Notes in Computer Science (LNCS), vol. 4943 (Springer, Berlin, 2008), pp. 287–293Google Scholar
  114. [HRT08]
    F. Hannig, H. Ruckdeschel, J. Teich, The PAULA language for designing multi-dimensional dataflow-intensive applications, in Proceedings of the GI/ITG/GMM-Workshop (Shaker, Dresden, 2008), pp. 129–138MATHGoogle Scholar
  115. [HLB+14]
    F. Hannig, V. Lari, S. Boppu, A. Tanase, O. Reiche, Invasive tightly-coupled processor arrays: a domain-specific architecture/compiler co-design approach. ACM Trans. Embed. Comput. Syst. 13(4s), 133:1–133:29 (2014)Google Scholar
  116. [HLP11]
    M. Happe, E. Lübbers, M. Platzner, A self-adaptive heterogeneous multi-core architecture for embedded real-time video object tracking. J. Real-Time Image Proc. 8(1), 1–16 (2011)Google Scholar
  117. [HHK14]
    M. Happe, Y. Huang, A. Keller, Dynamic protocol stacks in smart camera networks, in Proceedings of the International Conference on Reconfigurable Computing and FPGAs (ReConFig), December 2014 (IEEE, New York, 2014), pp. 1–6CrossRefGoogle Scholar
  118. [HTK15]
    M. Happe, A. Traber, A. Keller, Preemptive hardware multitasking in ReconOS, in Proceedings of the Symposium on Applied Reconfigurable Computing (ARC), March 2015. Lecture Notes in Computer Science (LNCS), vol. 9040 (Springer, Berlin, 2015), pp. 79–90Google Scholar
  119. [HTHT09]
    Y. Hara, H. Tomiyama, S. Honda, H. Takada, Proposal and quantitative analysis of the CHStone benchmark program suite for practical C-based high-level synthesis. J. Inf. Process. 17, 242–254 (2009)Google Scholar
  120. [HS88]
    C. Harris, M. Stephens, A combined corner and edge detector, in Proceedings of the 4th Alvey Vision Conference (1988), pp. 147–151Google Scholar
  121. [HD07]
    S. Hauck, A. DeHon, Reconfigurable Computing: The Theory and Practice of FPGA-Based Computation (Morgan Kaufmann Publishers, San Francisco, 2007)MATHGoogle Scholar
  122. [Hes13]
    R. Hess, Particle Filter Object Tracking – C code (2013), http://blogs.oregonstate.edu/hess/code/particles Google Scholar
  123. [HA00]
    J. Hoe, Arvind, Hardware synthesis from term rewriting systems, in VLSI: Systems on a Chip, ed. by L. Silveira, S. Devadas, R. Reis. IFIP – The International Federation for Information Processing, vol. 34 (Springer, Berlin, 2000), pp. 595–619.Google Scholar
  124. [Hof13]
    P. Hofstee, The big deal about big data, in Keynote Talk at the 8th IEEE International Conference on Networking, Architecture, and Storage (NAS), July 2013Google Scholar
  125. [Hog15]
    S. Hogg, What is LabVIEW? (2015), http://www.ni.com/newsletter/51141/en/. Accessed 4 Aug 2015 [Online]
  126. [HLC+15]
    Q. Huang, R. Lian, A. Canis, J. Choi, R. Xi, N. Calagar, S. Brown, J. Anderson, The effect of compiler optimizations on high-level synthesis-generated hardware. ACM Trans. Reconfigurable Technol. Syst. 8(3), 14:1–14:26 (2015)Google Scholar
  127. [HW13]
    E. Hung, S. Wilton, Towards simulator-like observability for FPGAs: a virtual overlay network for trace-buffers, in Proceedings of the ACM/SIGDA International Symposium on Field Programmable Gate Arrays (FPGA) (ACM, New York, 2013), pp. 19–28Google Scholar
  128. [HW14]
    E. Hung, S. Wilton, Incremental trace-buffer insertion for FPGA debug. IEEE Trans. Very Large Scale Integr. VLSI Syst. 22(4), 850–863 (2014)CrossRefGoogle Scholar
  129. [HB06]
    M. Hutton, V. Betz, FPGA Synthesis and Physical Design, Chap. 13 (CRC Press, Boca Raton, 2006), pp. 13:1–13:30Google Scholar
  130. [Ino12]
    Inomize, GZIP HW Accelerator (2012), http://www.inomize.com/index.php/content/index/gzip-hw-accelerator. Accessed 4 Aug 2015 [Online]
  131. [Int13]
    Intel Corp., Scaling Acceleration Capacity from 5 to 50 Gbps and Beyond with Intel QuickAssist Technology (2013), http://www.intel.com/content/dam/www/public/us/en/documents/solution-briefs/scaling-acceleration-capacity-brief.pdf. Accessed 4 Aug 2015 [Online]
  132. [IS11]
    A. Ismail, L. Shannon, FUSE: front-end user framework for O/S abstraction of hardware accelerators, in Proceedings of the International Symposium on Field-Programmable Custom Computing Machines (FCCM) (IEEE, New York, 2011), pp. 170–177Google Scholar
  133. [IBE+10]
    X. Iturbe, K. Benkrid, A. Erdogan, T. Arslan, M. Azkarate, I. Martinez, A. Perez, R3TOS: a reliable reconfigurable real-time operating system, in Proceedings of the NASA/ESA Conference on Adaptive Hardware and Systems (AHS) (2010), pp. 99–104Google Scholar
  134. [JFM15]
    A. Jain, S. Fahmy, D. Maskell, Efficient overlay architecture based on DSP blocks, in Proceedings of the IEEE 23rd Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM), May 2015, pp. 25–28Google Scholar
  135. [KAS+02]
    V. Kathail, S. Aditya, R. Schreiber, B. Rau, D. Cronquist, M. Sivaraman, PICO: automatically designing custom computers. Computer 35(9), 39–47 (2002)MATHCrossRefGoogle Scholar
  136. [KBNH14]
    A. Keller, D. Borkmann, S. Neuhaus, M. Happe, Self-awareness in computer networks. Int. J. Reconfigurable Comput. (2014). Article ID 692076Google Scholar
  137. [KR88]
    B. Kernighan, D. Ritchie, The C programming language, 2nd, ANSI-C edn. (Prentice-Hall, Englewood Cliffs, 1988)Google Scholar
  138. [Khr08]
    Khronos OpenCL Working Group, The OpenCL Specification, version 1.0.29 (2008)Google Scholar
  139. [KS11]
    J. Kingyens, J. Steffan, The potential for a GPU-Like overlay architecture for FPGAs. Int. J. Reconfigurable Comput. (2011). Article ID 514581Google Scholar
  140. [KHKT06]
    D. Kissler, F. Hannig, A. Kupriyanov, J. Teich, A dynamically reconfigurable weakly programmable processor array architecture template, in Proceedings of the 2nd International Workshop on Reconfigurable Communication Centric System-on-Chips (ReCoSoC) (2006), pp. 31–37Google Scholar
  141. [Koc13]
    D. Koch, Partial Reconfiguration on FPGAs – Architectures, Tools and Applications (Springer, Berlin, 2013)CrossRefGoogle Scholar
  142. [KT11]
    D. Koch, J. Torresen, FPGASort: a high performance sorting architecture exploiting run-time reconfiguration on FPGAs for large problem sorting, in Proceedings of the ACM/SIGDA International Symposium on Field Programmable Gate Arrays (FPGA) (2011), pp. 45–54Google Scholar
  143. [KBT08]
    D. Koch, C. Beckhoff, J. Teich, ReCoBus-Builder: a novel tool and technique to build statically and dynamically reconfigurable systems for FPGAs, in Proceedings of the International Conference on Field Programmable Logic and Applications (FPL), September 2008, pp. 119–124Google Scholar
  144. [KBL13]
    D. Koch, C. Beckhoff, G. Lemieux, An efficient FPGA overlay for portable custom instruction set extensions, in Proceedings of the 23rd International Conference on Field Programmable Logic and Applications (FPL), September 2013, pp. 1–8Google Scholar
  145. [KR06]
    I. Kuon, J. Rose, Measuring the gap between FPGAs and ASICs, in Proceedings of the ACM/SIGDA 14th International Symposium on Field Programmable Gate Arrays (FPGA) (ACM, New York, 2006), pp. 21–30Google Scholar
  146. [LA04]
    C. Lattner, V. Adve, LLVM: a compilation framework for lifelong program analysis & transformation, in Proceedings of the International Symposium on Code Generation and Optimization (CGO) (IEEE Computer Society, Los Alamitos, 2004), pp. 75–88Google Scholar
  147. [LPL+11]
    C. Lavin, M. Padilla, J. Lamprecht, P. Lundrigan, B. Nelson, B. Hutchings, HMFlow: accelerating FPGA compilation with hard macros for rapid prototyping, in Proceedings of the IEEE 19th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM), May 2011, pp. 117–124Google Scholar
  148. [LCD+10]
    I. Lebedev, S. Cheng, A. Doupnik, J. Martin, C. Fletcher, D. Burke, M. Lin, J. Wawrzynek, MARC: a many-core approach to reconfigurable computing, in Proceedings of the International Conference on Reconfigurable Computing and FPGAs (ReConFig), December 2010, pp. 7–12Google Scholar
  149. [LS12]
    C. Lin, H. So, Energy-efficient dataflow computations on FPGAs using application-specific coarse-grain architecture synthesis. ACM SIGARCH Comput. Archit. News 40(5), 58–63 (2012)CrossRefGoogle Scholar
  150. [LCP+11]
    O. Lindtjorn, R. Clapp, O. Pell, H. Fu, M. Flynn, O. Mencer, Beyond traditional microprocessors for geoscience high-performance computing applications. IEEE Micro 31(2), 41–49 (2011)CrossRefGoogle Scholar
  151. [LNS15]
    C. Liu, H. Ng, H. So, Automatic nested loop acceleration on FPGAs using soft CGRA overlay, in Proceedings of the Second International Workshop on FPGAs for Software Programmers (FSP), September 2015, pp. 13–18Google Scholar
  152. [LLV15]
    LLVM, LLVM - Low Level Virtual Machine (2015), http://www.llvm.org. Accessed 1 April 2015 [Online]
  153. [LP09a]
    E. Lübbers, M. Platzner, Cooperative multithreading in dynamically reconfigurable systems, in Proceedings of the IEEE International Conference on Field Programmable Logic and Applications (FPL) (IEEE, New York, 2009), pp. 1–4Google Scholar
  154. [LP09b]
    E. Lübbers, M. Platzner, ReconOS: multithreaded programming for reconfigurable computers. ACM Trans. Embed. Comput. Syst. 9(1), 8:1–8:33 (2009)Google Scholar
  155. [LMVV05]
    R. Lysecky, K. Miller, F. Vahid, K. Vissers, Firm-core virtual FPGA for just-in-time FPGA compilation, in Proceedings of the 2005 ACM/SIGDA 13th International Symposium on Field-Programmable Gate Arrays (FPGA) (ACM, New York, 2005), p. 271CrossRefGoogle Scholar
  156. [MSY+15]
    G. Malazgirt, N. Sonmez, A. Yurdakul, O. Unsal, A. Cristal, High level synthesis based hardware accelerator design for processing SQL queries, in Proceedings of the FPGAworld Conference, September 2015, pp. 1–6Google Scholar
  157. [MS09]
    G. Martin, G. Smith, High-level synthesis: Past, present, and future. IEEE Des. Test Comput. 26(4), 18–25 (2009)CrossRefGoogle Scholar
  158. [MJA13]
    A. Martin, D. Jamsek, K. Agarwal, FPGA-based application acceleration: case study with GZIP compression/decompression streaming engine, in Proceedings of the International Conference on Computer-Aided Design (ICCAD) (2013)Google Scholar
  159. [MSF12]
    E. Matthews, L. Shannon, A. Fedorova, Polyblaze: from one to many bringing the MicroBlaze into the multicore era with linux SMP support, in International Conference on Field Programmable Logic and Applications, August 2012, pp. 224–230Google Scholar
  160. [ME95]
    L. McMurchie, C. Ebeling, Pathfinder: a negotiation-based performance-driven router for FPGAs, in Proceedings of the ACM third International Symposium on Field-Programmable Gate Arrays (FPGA) (1995), pp. 111–117Google Scholar
  161. [MVG+12]
    W. Meeus, K. Van Beeck, T. Goedemé, J. Meel, D. Stroobandt, An overview of today’s high-level synthesis tools. Des. Autom. Embed. Syst. 16(3), 31–51 (2012)CrossRefGoogle Scholar
  162. [MHT+12]
    R. Membarth, F. Hannig, J. Teich, M. Körner, W. Eckert, Generating device-specific GPU code for local operators in medical imaging, in Proceedings of the 26th IEEE International Parallel & Distributed Processing Symposium (IPDPS), May 2012 (IEEE, New York, 2012), pp. 569–581Google Scholar
  163. [MRHT14]
    R. Membarth, O. Reiche, F. Hannig, J. Teich, Code generation for embedded heterogeneous architectures on Android, in Proceedings of the Conference on Design, Automation and Test in Europe (DATE), March 2014, pp. 1–6Google Scholar
  164. [MRH+16]
    R. Membarth, O. Reiche, F. Hannig, J. Teich, M. Körner, W. Eckert, HIPAcc: a domain-specific language and compiler for image processing.IEEE Trans. Parallel Distrib. Syst. 27(1), 210–224 (2016)Google Scholar
  165. [Mer08]
    M. Meredith, High-Level SystemC Synthesis with Forte’s Cynthesizer, Chap. 5, pp. 75–97; in Coussy, Morawiec [CM08], 1st edn. (2008)Google Scholar
  166. [Mic94]
    G. Micheli, Synthesis and Optimization of Digital Circuits, 1st edn. (McGraw-Hill Higher Education, New York, 1994)Google Scholar
  167. [Moo65]
    G. Moore, Cramming more components onto integrated circuits. Electronics 38(8), 114–117 (1965)Google Scholar
  168. [Muc97]
    S. Muchnick, Advanced Compiler Design and Implementation (Morgan Kaufmann Publishers, San Francisco, 1997)Google Scholar
  169. [MTA10]
    R. Mueller, J. Teubner, G. Alonso, Glacier: a query-to-hardware compiler, in Proceedings of the ACM SIGMOD International Conference on Management of Data (2010), pp. 1159–1162Google Scholar
  170. [NI14]
    W. Najjar, P. Ienne, Reconfigurable computing. IEEE Micro 34(1), 4–6 (2014)CrossRefGoogle Scholar
  171. [NBD+03]
    W. Najjar, A. Böhm, B. Draper, J. Hammes, R. Rinker, J. Beveridge, M. Chawathe, C. Ross, High-level language abstraction for reconfigurable computing. IEEE Comput. 36(8), 63–69 (2003)CrossRefGoogle Scholar
  172. [Nat15a]
    National Instruments, LabVIEW Communications System Design Suite Overview (2015), http://www.ni.com/white-paper/52502/en/. Accessed 4 Aug 2015 [Online]
  173. [Nat15b]
    National Instruments, LabVIEW FPGA (2015), http://www.ni.com/fpga/. Accessed 4 Aug 2015 [Online]
  174. [Nat15c]
    National Instruments, LabVIEW RIO Architecture (2015), http://www.ni.com/white-paper/10894/en/. Accessed 4 Aug 2015 [Online]
  175. [Nat15d]
    National Instruments, NI myRIO (2015), http://www.ni.com/myrio/. Accessed 4 Aug 2015 [Online]
  176. [Nat15e]
    National Instruments, NI Scan Engine Advanced I/O Access (2015), http://www.ni.com/white-paper/8071/en/. Accessed 4 Aug 2015 [Online]
  177. [Nik04]
    R. Nikhil, Bluespec system verilog: efficient, correct RTL from high level specifications, in Proceedings of the Second ACM and IEEE International Conference on Formal Methods and Models for Co-Design (MEMOCODE), June 2004, pp. 69–70Google Scholar
  178. [Nik08]
    R. Nikhil, Bluespec: A General-Purpose Approach to High-Level Synthesis Based on Parallel Atomic Transactions, Chap. 8, pp. 129–146; in Coussy, Morawiec [CM08], 1st edn. (2008)Google Scholar
  179. [NCV+03]
    V. Nollet, P. Coene, D. Verkest, S. Vernalde, R. Lauwereins, Designing an operating system for a heterogeneous reconfigurable SoC, in Proceedings of the International Parallel and Distributed Processing Symposium (IPDPS) (2003), pp. 1–7Google Scholar
  180. [OSV11]
    M. Odersky, L. Spoon, B. Venners, Programming in Scala: A Comprehensive Step-by-Step Guide, 2nd edn. (Artima Incorporation, Walnut Creek, 2011)Google Scholar
  181. [Ope15a]
    OpenACC, OpenACC directives for accelerators (2015), http://www.openacc-standard.org/. Accessed 4 Aug 2015 [Online]
  182. [Ope15b]
    OpenMP, The OpenMP API Specification for Parallel Programming (2015), http://openmp.org/. Accessed 4 Aug 2015 [Online]
  183. [Pap03]
    I. Papaefstathiou, Titan II: an IPComp processor for 10Gbit/sec networks, in Proceedings of the IEEE Computer Society Annual Symposium on VLSI, February 2003, pp. 234–235Google Scholar
  184. [PPA+13]
    A. Parashar, M. Pellauer, M. Adler, B. Ahsan, N. Crago, D. Lustig, V. Pavlov, A. Zhai, M. Gambhir, A. Jaleel, R. Allmon, R. Rayess, S. Maresh, J. Emer, Triggered instructions: a control paradigm for spatially-programmed architectures. ACM SIGARCH Comput. Archit. News 41(3), 142–153 (2013)CrossRefGoogle Scholar
  185. [PK89]
    P. Paulin, J. Knight, Force-directed scheduling for the behavioral synthesis of ASICs. IEEE Trans. Comput. Aided Des. 8(6), 661–679 (1989)CrossRefGoogle Scholar
  186. [PACE09]
    M. Pellauer, M. Adler, D. Chiou, J. Emer, Soft connections: addressing the hardware-design modularity problem, in Proceedings of the 27th ACM/IEEE Design Automation Conference (DAC) (ACM, New York, 2009), pp. 276–281Google Scholar
  187. [PSKK15]
    N. Pham, A. Singh, A. Kumar, M. Khin, Exploiting loop-array dependencies to accelerate the design space exploration with high level synthesis, in Proceedings of the Conference on Design, Automation and Test in Europe (DATE) (EDA Consortium, San Jose, 2015), pp. 157–162Google Scholar
  188. [Pou10]
    L.-N. Pouchet, Iterative optimization in the polyhedral model, PhD thesis, University of Paris-Sud 11, Orsay, France, 2010Google Scholar
  189. [PZSC13]
    L.-N. Pouchet, P. Zhang, P. Sadayappan, J. Cong, Polyhedral-based data reuse optimization for configurable computing, in Proceedings of the ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA) (ACM, New York, 2013), pp. 29–38Google Scholar
  190. [PBKE12]
    K. Pulli, A. Baksheev, K. Kornyakov, V. Eruhimov, Real-time computer vision with OpenCV. Commun. ACM 55(6), 61–69 (2012)CrossRefGoogle Scholar
  191. [PCC+14]
    A. Putnam, A. Caulfield, E. Chung, D. Chiou, K. Constantinides, J. Demme, H. Esmaeilzadeh, J. Fowers, G. Gopal, J. Gray, M. Haselman, S. Hauck, S. Heil, A. Hormati, J.-Y. Kim, S. Lanka, J. Larus, E. Peterson, S. Pope, A. Smith, J. Thong, P. Xiao, D. Burger, A reconfigurable fabric for accelerating large-scale datacenter service, in Proceedings of the ACM/IEEE 41st International Symposium on Computer Architecture (ISCA) (IEEE, New York, 2014), pp. 13–24Google Scholar
  192. [Rau94]
    B. Rau, Iterative modulo scheduling: an algorithm for software pipelining loops, in Proceedings of the 27th Annual International Symposium on Microarchitecture (MICRO) (ACM, New York, 1994), pp. 63–74Google Scholar
  193. [Rau96]
    B. Rau, Iterative modulo scheduling, Technical report, HP, USA, 1996, http://www.hpl.hp.com/techreports/94/HPL-94-115.html Google Scholar
  194. [RSH+14]
    O. Reiche, M. Schmid, F. Hannig, R. Membarth, J. Teich, Code generation from a domain-specific language for C-based HLS of hardware accelerators, in Proceedings of the International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS), October 2014 (ACM, New York, 2014), pp. 1–10Google Scholar
  195. [RBK07]
    S. Rigler, W. Bishop, A. Kennings, FPGA-based lossless data compression using Huffman and LZ77 algorithms, in Proceedings of the Canadian Conference on and Computer Engineering (CCECE), April 2007, pp. 1235–1238Google Scholar
  196. [RNPL15]
    A. Rodchenko, A. Nisbet, A. Pop, M. Luján, Effective barrier synchronization on Intel Xeon Phi coprocessor, in Proceedings of the 21st International Conference on Parallel Processing (Euro-Par), August 2015. Lecture Notes in Computer Science (LNCS), vol. 9233, pp. 588–600Google Scholar
  197. [RVS+10]
    J. Robinson, S. Vafaee, J. Scobbie, M. Ritche, J. Rose, The Supersmall soft processor, in Proceedings of the Southern Programmable Logic Conference (SPL), March 2010, pp. 3–8Google Scholar
  198. [San12]
    Sandgate Technologies, Inc., GZIP/GUNZIP Silicon IP Family (2012), http://www.sandgate.com/new/static/QuickZIP%20Family%20Product%20Brief%20%28V1.2a%29.pdf. Accessed 4 Aug 2015 [Online]
  199. [SW10]
    B. Schafer, K. Wakabayashi, Design space exploration acceleration through operation clustering. IEEE Trans. Comput. Aided Des. 29(1), 153–157 (2010)CrossRefGoogle Scholar
  200. [SW12]
    B. Schafer, K. Wakabayashi, Divide and conquer high-level synthesis design space exploration. ACM Trans. Des. Autom. Electron. Syst. 17(3), 29:1–29:19 (2012)Google Scholar
  201. [STH+14]
    M. Schmid, A. Tanase, F. Hannig, J. Teich, V. Bhadouria, D. Ghoshal, Domain-specific augmentations for high-level synthesis, in Proceedings of the 25th IEEE International Conference on Application-specific Systems, Architectures and Processors (ASAP), June 2014 (IEEE, New York, 2014), pp. 173–177Google Scholar
  202. [SAHT14]
    M. Schmid, N. Apelt, F. Hannig, J. Teich, An image processing library for C-based high-level synthesis, in Proceedings of the 24th International Conference on Field Programmable Logic and Applications (FPL), September 2014 (IEEE, New York, 2014), pp. 1–4Google Scholar
  203. [SKH+14]
    C. Schmitt, S. Kuckuk, F. Hannig, H. Köstler, J. Teich, ExaSlang: a domain-specific language for highly scalable multigrid solvers, in Proceedings of the 4th International Workshop on Domain-Specific Languages and High-Level Frameworks for High Performance Computing (WOLFHPC), November 2014 (IEEE Computer Society, Los Alamitos, 2014), pp. 42–51CrossRefGoogle Scholar
  204. [SSH+15]
    C. Schmitt, M. Schmid, F. Hannig, J. Teich, S. Kuckuk, H. Köstler, Generation of multigrid-based numerical solvers for FPGA accelerators, in Proceedings of the 2nd International Workshop on High-Performance Stencil Computations (HiStencils), January 2015, pp. 9–15Google Scholar
  205. [Sch96]
    J. Schutten, List scheduling revisited. Oper. Res. Lett. 18(4), 167–170 (1996)MATHCrossRefGoogle Scholar
  206. [SL12]
    A. Severance, G. Lemieux, VENICE: a compact vector processor for FPGA applications, in Proceedings of the International Conference on Field-Programmable Technology (FPT), December 2012, pp. 261–268Google Scholar
  207. [SBB06]
    S. Shukla, N. Bergmann, J. Becker, QUKU: a two-level reconfigurable architecture, in Proceedings of the IEEE Computer Society Annual Symposium on Emerging VLSI Technologies and Architectures, March 2006, pp. 1–6Google Scholar
  208. [Sin11]
    S. Singh, Computing without processors. Commun. ACM 54(8), 46–54 (2011)CrossRefGoogle Scholar
  209. [SB06]
    H. So, R. Brodersen, Improving usability of FPGA-based reconfigurable computers through operating system support, in Proceedings of the International Conference on Field Programmable Logic and Applications (FPL), August 2006, pp. 1–6Google Scholar
  210. [SB08]
    H. So, R. Brodersen, A unified hardware/software runtime environment for FPGA-based reconfigurable computers using BORPH. ACM Trans. Embed. Comput. Syst. 7(2), 14:1–14:28 (2008)Google Scholar
  211. [SS15]
    A. Soltani, S. Sharifian, An ultra-high throughput and fully pipelined implementation of AES algorithm on FPGA. Microprocess. Microsyst. 39(7), 480–493 (2015)CrossRefGoogle Scholar
  212. [SWP04]
    C. Steiger, H. Walder, M. Platzner, Operating systems for reconfigurable embedded platforms: online scheduling of real-time tasks. IEEE Trans. Comput. 53(11), 1392–1407 (2004)CrossRefGoogle Scholar
  213. [Ste04]
    F. Stein, Efficient computation of optical flow using the census transform, in Pattern Recognition. Lecture Notes in Computer Science (LNCS), vol. 3175 (Springer, Berlin, 2004), pp. 79–86Google Scholar
  214. [ST82]
    K. Stüben, U. Trottenberg, Multigrid methods: fundamental algorithms, model problem analysis and applications, in Multigrid Methods. Lecture Notes in Mathematics, vol. 960 (Springer, Berlin, 1982), pp. 1–176Google Scholar
  215. [Sum08]
    T. Summers, Hardware based GZIP Compression, Benefits and Applications (2008), http://www.comtechaha.com/Uploads/GZIP-Benefits-Apps.pdf. Accessed 4 Aug 2015 [Online]
  216. [TMK10]
    M. Tahghighi, M. Mousavi, P. Khadivi, Hardware implementation of a novel adaptive version of deflate compression algorithm, in Proceedings of the 18th Iranian Conference on Electrical Engineering (ICEE), May 2010 (IEEE, New York, 2010), pp. 566–569Google Scholar
  217. [TB01]
    R. Tessier, W. Burleson, Reconfigurable computing for digital signal processing: a survey. J. VLSI Signal Process. Syst. Signal Image Video Technol. 28(1–2), 7–27 (2001)MATHCrossRefGoogle Scholar
  218. [TPD15]
    R. Tessier, K. Pocek, A. DeHon, Reconfigurable computing architectures. Proc. IEEE 103(3), 332–354 (2015)CrossRefGoogle Scholar
  219. [TBU00]
    P. Thevenaz, T. Blu, M. Unser, Interpolation revisited. IEEE Trans. Med. Imaging 19(7), 739–758 (2000)CrossRefGoogle Scholar
  220. [Tra08]
    Transaction Processing Performance Council, TPC-H Benchmark Specification (2008), http://www.tpc.org/tpch/spec/tpch2.6.0.pdf Google Scholar
  221. [TCJW97]
    S. Trimberger, D. Carberry, A. Johnson, J. Wong, A time-multiplexed FPGA, in Proceedings of the 5th IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM), April 1997 (IEEE Computer Society, Washington, DC, 1997), pp. 22–28Google Scholar
  222. [TGP07]
    J. Tripp, M. Gokhale, K. Peterson, Trident: from high-level language to hardware circuitry. Computer 40(3), 28–37 (2007)CrossRefGoogle Scholar
  223. [Van94]
    E. Van de Velde, Poisson solvers, in Concurrent Scientific Computing. Texts in Applied Mathematics, vol. 16 (Springer New York, 1994), pp. 183–216Google Scholar
  224. [VFHB14]
    E. Vermij, L. Fiorin, C. Hagleitner, K. Bertels, Exascale radio astronomy: can we ride the technology wave? in Proceedings of the 29th International Conference on Supercomputing (ISC), June 2014, ed. by J. Kunkel, T. Ludwig, H. Meuer. Lecture Notes in Computer Science (LNCS), vol. 8488 (Springer, Berlin, 2014), pp. 35–52Google Scholar
  225. [VPI05]
    M. Vuletic, L. Pozzi, P. Ienne, Seamless hardware-software integration in reconfigurable computing systems.IEEE Des. Test Comput. 22(2), 102–113 (2005)Google Scholar
  226. [WO00]
    K. Wakabayashi, T. Okamoto, C-based SoC design flow and EDA tools: an ASIC and system vendor perspective. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 19(12), 1507–1522 (2000)CrossRefGoogle Scholar
  227. [WS08]
    K. Wakabayashi, B. Schafer, “All-in-C” Behavioral Synthesis and Verification with CyberWorkBench, Chap. 7, pp. 113–127; in Coussy, Morawiec [CM08], 1st edn. (2008)Google Scholar
  228. [WC91]
    R. Walker, R. Camposano (eds.), A Survey of High Level Synthesis Systems (Kluwer Academic, Norwell, 1991)MATHGoogle Scholar
  229. [WZCC12]
    Y. Wang, P. Zhang, X. Cheng, J. Cong, An integrated and automated memory optimization flow for FPGA behavioral synthesis, in Proceedings of the 17th Asia and South Pacific Design Automation Conference (ASP-DAC) (IEEE, New York, 2012), pp. 257–262Google Scholar
  230. [WYZ+12]
    Y. Wang, J. Yan, X. Zhou, L. Wang, W. Luk, C. Peng, J. Tong, A partially reconfigurable architecture supporting hardware threads, in Proceedings of the International Conference on Field-Programmable Technology (FPT) (2012), pp. 269–276Google Scholar
  231. [WLZ+13]
    Y. Wang, P. Li, P. Zhang, C. Zhang, J. Cong, Memory partitioning for multidimensional arrays in high-level synthesis, in Proceedings of the 50th Annual Design Automation Conference (DAC) (ACM, New York, 2013), pp. 12:1–12:8Google Scholar
  232. [WLC14]
    Y. Wang, P. Li, J. Cong, Theory and algorithm for generalized memory partitioning in high-level synthesis, in Proceedings of the 2014 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA) (ACM, New York, 2014), pp. 199–208Google Scholar
  233. [Wei84]
    R. Weicker, Dhrystone: a synthetic systems programming benchmark. Commun. ACM 27(10), 1013–1030 (1984)CrossRefGoogle Scholar
  234. [WSRM12]
    S. Weston, J. Spooner, S. Racanière, O. Mencer, Rapid computation of value and risk for derivatives portfolios. Concurr. Comput. 24(8), 880–894 (2012)CrossRefGoogle Scholar
  235. [Wik15]
    Wikipedia, AirPlay — Wikipedia, The Free Encyclopedia (2015), https://en.wikipedia.org/w/index.php?title=AirPlay&oldid=663198848. Accessed 4 Aug 2015 [Online]
  236. [WFW+94]
    R. Wilson, R. French, C. Wilson, S. Amarasinghe, J. Anderson, S. Tjiang, S.-W. Liao, C.-W. Tseng, M. Hall, M. Lam, J. Hennessy, SUIF: an infrastructure for research on parallelizing and optimizing compilers. ACM SIGPLAN Not. 29(12), 31–37 (1994)CrossRefGoogle Scholar
  237. [Win15]
    Wind River, Wind River Linux (2015), http://www.windriver.com/products/linux. Accessed 26 July 2015 [Online]
  238. [WBC14]
    F. Winterstein, S. Bayliss, G. Constantinides, Separation logic-assisted code transformations for efficient high-level synthesis, in Proceedings of the IEEE 22nd Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM) (IEEE, New York, 2014), pp. 1–8Google Scholar
  239. [WFY+15]
    F. Winterstein, K. Fleming, H.-J. Yang, S. Bayliss, G. Constantinides, MATCHUP: Memory abstractions for heap manipulating programs, Proceedings of the ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA) (ACM, New York, 2015), pp. 136–145Google Scholar
  240. [WL91]
    M. Wolf, M. Lam, A loop transformation theory and an algorithm to maximize parallelism. IEEE Trans. Parallel Distrib. Syst. 2(4), 452–471 (1991)CrossRefGoogle Scholar
  241. [Wol96]
    M. Wolfe, High performance compilers for parallel computing (Addison-Wesley, Boston, 1996)MATHGoogle Scholar
  242. [WLP+14]
    L. Wu, A. Lottarini, T. Paine, M. Kim, K. Ross, Q100: the architecture and design of a database processing unit, in Proceedings of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS) (2014), pp. 255–268Google Scholar
  243. [Xil11]
    Xilinx Inc., Data2MEM User Guide (UG658, v13.3) (2011), http://www.xilinx.com/support/documentation/sw_manuals/xilinx14_2/data2mem.pdf. Accessed 19 Sept 2012 [Online]
  244. [Xil15a]
    Xilinx Inc., MicroBlaze Processor Reference Guide (UG984, v2014.1) (2015)Google Scholar
  245. [Xil15b]
    Xilinx Inc., MicroBlaze Soft Processor Core (2015), http://www.xilinx.com/tools/microblaze.htm. Accessed 22 July 2015 [Online]
  246. [Xil15c]
    Xilinx Inc., OS and Libraries Document Collection (UG643, v2015.2) (2015)Google Scholar
  247. [Xil15d]
    Xilinx Inc., PetaLinux Tools (2015), http://www.xilinx.com/tools/petalinux-sdk.htm. Accessed 26 July 2015 [Online]
  248. [Xil15e]
    Xilinx, Inc., Vivado High-Level Synthesis (2015), http://www.xilinx.com/products/design-tools/vivado/integration/esl-design.html. Accessed 12 July 2015[Online]
  249. [Xil15f]
    Xilinx Inc., Zynq-7000 All Programmable SoC Overview (DS190, v1.8) (2015)Google Scholar
  250. [Xil15g]
    Xilinx Inc., Zynq UltraScale+ MPSoC Product Tables and Product Selection Guide (2015)Google Scholar
  251. [YFAE14]
    H. Yang, K. Fleming, M. Adler, J. Emer, LEAP shared memories: automating the construction of FPGA coherent memories, in Proceedings of the Symposium on Field-Programmable Custom Computing Machines (FCCM) (IEEE, New York, 2014), pp. 117–124Google Scholar
  252. [YSR09]
    P. Yiannacouras, J. Steffan, J. Rose, Fine-grain performance scaling of soft vector processors, in Proceedings of the International Conference on Compilers, Architecture, and Synthesis for Embedded Systems (CASES) (ACM, New York, 2009), pp. 97–106Google Scholar
  253. [YKL15]
    M. Yue, D. Koch, G. Lemieux, Rapid overlay builder for Xilinx FPGAs, in Proceedings of the IEEE 23rd Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM), May 2015, pp. 17–20Google Scholar
  254. [ZFJ+08]
    Z. Zhang, Y. Fan, W. Jiang, G. Han, C. Yang, J. Cong, AutoPilot: A Platform-Based ESL Synthesis System, Chap. 6, pp. 99–112; in Coussy, Morawiec [CM08]Google Scholar
  255. [ZL13]
    Z. Zhang, B. Liu, SDC-based modulo scheduling for pipeline synthesis, in Proceedings of the International Conference on Computer-Aided Design (ICCAD) (2013), pp. 211–218Google Scholar
  256. [ZL77]
    J. Ziv, A. Lempel, A universal algorithm for sequential data compression.IEEE Trans. Inf. Theory 23(3), 337–343 (1977)Google Scholar
  257. [ZLC+13]
    W. Zuo, P. Li, D. Chen, L.-N. Pouchet, S. Zhong, J. Cong, Improving polyhedral code generation for high-level synthesis, in Proceedings of the International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS) (IEEE Press, Piscataway, 2013), pp. 1–10Google Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  • Moritz Schmid
    • 1
  • Christian Schmitt
    • 1
  • Frank Hannig
    • 1
  • Gorker Alp Malazgirt
    • 2
  • Nehir Sonmez
    • 3
    • 4
    • 5
  • Arda Yurdakul
    • 2
  • Adrian Cristal
    • 3
    • 4
    • 5
  1. 1.Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU)ErlangenGermany
  2. 2.Bogazici UniversityIstanbulTurkey
  3. 3.Barcelona Supercomputing CenterBarcelonaSpain
  4. 4.Centro Superior de Investigaciones Cientificas (IIIA-CSIC)BarcelonaSpain
  5. 5.Universitat Politecnica de CatalunyaBarcelonaSpain

Personalised recommendations