Architecture for Transparent Binary Acceleration of Loops with Memory Accesses

  • Nuno Paulino
  • João Canas Ferreira
  • João M. P. Cardoso
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7806)


This paper presents an extension to a hardware/software system architecture in which repetitive instruction traces, called Megablocks, Reconfigurable Processing Unit (RPU). This scheme is supported by a custom toolchain able to automatically generate a RPU tailored for the execution of one or more Megablocks detected offline. Switching between hardware and software execution is done transparently, without modifications to source code or executable binaries. Our approach has been evaluated using an architecture with a MicroBlaze General Purpose Processor (GPP) softcore. By using a memory sharing mechanism, the RPU can access the GPP’s data memory, allowing the acceleration of Megablocks with load/store operations. For a set of 21 embedded benchmarks, an average speedup of 1.43× is achieved, and a potential speedup of 2.09× is predicted for an implementation using a low overhead interface for communication between GPP and RPU.


reconfigurable processor memory access Megablock instruction trace MicroBlaze hardware acceleration FPGA 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Wolf, W.: A decade of hardware/software codesign. Computer 36, 38–43 (2003)CrossRefGoogle Scholar
  2. 2.
    Clark, N., Blome, J., Chu, M., Mahlke, S., Biles, S., Flautner, K.: An architecture framework for transparent instruction set customization in embedded processors. In: Proc. of the 32nd Annual Intl. Symposium on Computer Arch. (ISCA 2005), pp. 272–283. IEEE Computer Society, Washington, DC (May 2005)Google Scholar
  3. 3.
    Paek, J.K., Choi, K., Lee, J.: Binary acceleration using coarse-grained reconfigurable architecture. SIGARCH Comput. Archit. News 38(4), 33–39 (2011)CrossRefGoogle Scholar
  4. 4.
    Lysecky, R.L., Vahid, F.: Design and implementation of a microblaze-based warp processor. ACM Trans. Embedded Comput. Syst. 8(3), 22:1–22:22 (2009)CrossRefGoogle Scholar
  5. 5.
    Noori, H., Mehdipour, F., Murakami, K., Inoue, K., Saheb Zamani, M.: An architecture framework for an adaptive extensible processor. J. Supercomput. 45(3), 313–340 (2008)CrossRefGoogle Scholar
  6. 6.
    Kim, Y., Lee, J., Shrivastava, A., Paek, Y.: Memory access optimization in compilation for coarse-grained reconfigurable architectures. ACM Trans. Des. Autom. Electron. Syst. 16(4), 42:1–42:27 (2011)Google Scholar
  7. 7.
    Beck, A.C.S., Rutzig, M.B., Gaydadjiev, G., Carro, L.: Transparent reconfigurable acceleration for heterogeneous embedded applications. In: Proc. of the Conf. on Design, Automation and Test in Europe (DATE 2008), pp. 1208–1213. ACM (2008)Google Scholar
  8. 8.
    Bispo, J., Paulino, N., Cardoso, J.M., Ferreira, J.C.: Transparent runtime migration of loop-based traces of processor instructions to reconfigurable processing units. International Journal of Reconfigurable Computing (2012) (in press)Google Scholar
  9. 9.
    Bispo, J., Cardoso, J.M.P.: On identifying and optimizing instruction sequences for dynamic compilation. In: Proc. Intl. Conf. Field-Programmable Technology (FPT 2010), pp. 437–440 (2010)Google Scholar
  10. 10.
    Seoul National University: SNU Real-Time Benchmarks, (accessed December 23, 2012)
  11. 11.
    Texas Instruments: TMS320C6000 Image Library (IMGLIB) - SPRC264, (accessed December 23, 2012)
  12. 12.
    Warren, H.S.: Hacker’s Delight. Addison-Wesley Longman Publishing Co., Inc., Boston (2002)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Nuno Paulino
    • 1
  • João Canas Ferreira
    • 1
  • João M. P. Cardoso
    • 2
  1. 1.INESC TEC and Faculty of EngineeringUniversity of PortoPortugal
  2. 2.INESC TEC and Department of Informatics Engineering, Faculty of EngineeringUniversity of PortoPortugal

Personalised recommendations