Design Methodology for Offloading Software Executions to FPGA

  • Tomasz Patyk
  • Perttu Salmela
  • Teemu Pitkänen
  • Pekka Jääskeläinen
  • Jarmo Takala


Field programmable gate array (FPGA) is a flexible solution for offloading part of the computations from a processor. In particular, it can be used to accelerate an execution of a computationally heavy part of the software application, e.g., in DSP, where small kernels are repeated often. Since an application code for a processor is a software, a design methodology is needed to convert the code into a hardware implementation, applicable to the FPGA. In this paper, we propose a design method, which uses the Transport Triggered Architecture (TTA) processor template and the TTA-based Co-design Environment toolset to automate the design process. With software as a starting point, we generate a RTL implementation of an application-specific TTA processor together with the hardware/software interfaces required to offload computations from the system main processor. To exemplify how the integration of the customized TTA with a new platform could look like, we describe a process of developing required interfaces from a scratch. Finally, we present how to take advantage of the scalability of the TTA processor to target platform and application-specific requirements.


Application-specific integrated circuits Hardware accelerator Computer aided engineering System-on-a-chip Coprocessors Field programmable gate arrays 


  1. 1.
    Patyk, T., Salmela, P., Pitkänen, T., & Takala, J. (2010). Design methodology for accelerating software executions with FPGA. In Proc. IEEE workshop signal process. syst., Cupertino, CA, USA, 6–8 Oct. 2010 (pp. 46–51).Google Scholar
  2. 2.
    Synopsys Inc. (2011). High-Level Synthesis with Synphony C Compiler, Mountain View, CA, USA (4 p.) [online]. Available: Accessed 17 July 2011.
  3. 3.
    Hoffman, A., Kogel, T., Nohl, A., Braun, G., Schliebusch, O., Wahlen, O., et al. (2001). A novel methodology for the design of application-specific instruction-set processors (ASIPs) using a machine description language. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 20(11), 1338–1354.CrossRefGoogle Scholar
  4. 4.
    Praet, J. V., Lanneer, D., Geurts, W., & Goossens, G. (2001). Processor modeling and code selection for retargetable compilation. ACM Transactions on Design Automation of Electronic Systems, 6(3), 277–307.CrossRefGoogle Scholar
  5. 5.
    Target Compiler Technologies (2008). IP designed | IP programmer, Leuven, Belgium (p. 4). [online]. Available: Accessed 17 July 2011.
  6. 6.
    Cong, J. (2008). A new generation of C-based synthesis tool and domain-specific computing. In Proc. IEEE int. soc. conf., Newport Beach, CA, USA, 17–20 Sept. 2008 (Vol. 6507, pp. 386–386).Google Scholar
  7. 7.
    Impulse Accelerated Technologies Inc. (2007). Accelerate C in FPGA Kirkland, WA, USA (p. 2) [online]. Available: Accessed 17 July 2011.
  8. 8.
    Goering, R. (2006). Programmable logic: Startup moves binaries into FPGAs. EE Times.Google Scholar
  9. 9.
    CriticalBlue Ltd (2007). Cascade programmable application coprocessor generation, Pleasance Edinburgh, United Kingdom (p. 4) [online]. Available: Accessed 17 July 2011.
  10. 10.
    Mentor Graphics Corporation (2010). Catapult C synthesis datasheet, Wilsonville, OR, USA (p. 4) [online]. Available: Accessed 17 July 2011.
  11. 11.
    Forte Design Systems (2008). Cynthesizer TM the most productive path to silicon, San Jose, CA, USA (p. 2. [online]. Available: Accessed 17 July 2011.
  12. 12.
    ESNUG ELSE 06 Item 7 Subject: Mentor Catapult C (2006). [Online]. Available: Accessed 17 July 2011.
  13. 13.
    Reshadi, M., & Gajski, D. (2005). A cycle-accurate compilation algorithm for custom pipelined datapaths. In Proc. IEEE/ACM/IFIP int. conf. HW/SW codesign system synthesis, New York, NY, USA ,18–21 Sept. 2005 (pp. 21–26).Google Scholar
  14. 14.
    Corporaal, H. (1994). Design of transport triggered architectures. In Proc. 4th great lakes symp. design autom. high perf. VLSI syst., Notre Dame, IN, USA, 4–5 Mar. 1994 (pp. 130–135).Google Scholar
  15. 15.
    Jääskeläinen, P., Guzma, V., Clio, A., Pitkänen, T., & Takala, J. (2007). Codesign toolset for application-specific instruction-set processors. In Proc. SPIE multimedia mobile devices, San Jose, CA, USA, 29–30 Jan. 2007 (Vol. 6507, pp. 05070X–1–10).Google Scholar
  16. 16.
    Esko, O., Jääskeläinen, P., Huerta, P., de La Lama, C. S., Takala, J., & Martinez, J. I. (2010). Customized exposed datapath soft-core design flow with compiler support. In Proc. int. conf. field programmable logic and applications, Milano, Italy, 31 Aug.–8 Sept. 2010 (pp. 217–222).Google Scholar
  17. 17.
    TCE: TTA codesign environment (2011). [online]. Available: Accessed 17 July 2011.
  18. 18.
    Corporaal, H. (1999). TTAs: Missing the ILP complexity wall. Journal of Systems Architecture, 45(12–13), 949–973.CrossRefGoogle Scholar
  19. 19.
    Implementing AHB peripherals in logic tiles (2007). Application note 119 [online]. Available: Accessed 17 July 2011.
  20. 20.
    Maemo by Nokia (2011). [online]. Available: Accessed 17 July 2011.
  21. 21.
    AMBA open specifications (2011). [online]. Available: Accessed 17 July 2011.
  22. 22.
    Tremor by the Xiph.Org foundation (2006). [online]. Available: Accessed 17 July 2011.
  23. 23.
    Scratchbox cross-compilation toolkit project (2011). [online]. Available: Accessed 17 July 2011.

Copyright information

© Springer Science+Business Media, LLC 2011

Authors and Affiliations

  • Tomasz Patyk
    • 1
  • Perttu Salmela
    • 1
  • Teemu Pitkänen
    • 1
  • Pekka Jääskeläinen
    • 1
  • Jarmo Takala
    • 1
  1. 1.Department of Computer SystemsTampere University of TechnologyTampereFinland

Personalised recommendations