Abstract
Dynamic binary translation (DBT) transforms machine code at runtime into an optimzed form. DBT can have cross platform compatibility, better energy efficiency or improved performance as its goals. The goal of this work is to improve performance by executing perfomance critical parts of the binary code on a Coarse Grained Reconfigurable Array (CGRA). We show how the CGRA is integrated into the system and explain how performance critical parts of the binary code can be identified. We demonstrate the feasibility of a dynamic binary translation from RISC-V ISA to a CGRA, give details about the employed optimizations and show that the performance of a whole benchmark set can be improved by a factor of 1.7 without the need for any user intervention.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
LLVM loop terminology. https://llvm.org/docs/LoopTerminology.html
LLVM loop unrolling. https://llvm.org/doxygen/LoopUnroll_8cpp_source.html
Almog, Y., Rosner, R., Schwartz, N., Schmorak, A.: Specialized dynamic optimizations for high-performance energy-efficient microarchitecture. In: CGO, pp. 137–148 (2004)
Atkinson, R.R., McCreight, E.M.: The dragon processor. In: ASPLOS, pp. 65–69 (1987)
Boggs, D., Brown, G., Tuck, N., Venkatraman, K.S.: Denver: Nvidia’s first 64-bit ARM processor. IEEE Micro
Brandalero, M., Shafique, M., Carro, L., Beck, A.C.S.: TransRec: improving adaptability in single-ISA heterogeneous systems with transparent and reconfigurable acceleration. In: DATE, pp. 582–585 (2019)
Clark, N., Kudlur, M., Hyunchul Park, Mahlke, S., Flautner, K.: Application-specific processing on a general-purpose core via transparent instruction set customization. In: MICRO, pp. 30–40 (2004)
Deb, A., Codina, J.M., González, A.: SoftHV: A HW/SW co-designed processor with horizontal and vertical fusion. In: CF (2011)
Dehnert, J., et al.: The transmeta code morphing software: using speculation, recovery, and adaptive retranslation to address real-life challenges. In: CGO (2003)
Ebcioğlu, K., Altman, E.R.: DAISY: dynamic compilation for 100% architectural compatibility. In: ISCA, pp. 26–37 (1997)
Ferreira, R., Denver, W., Pereira, M., Quadros, J., Carro, L., Wong, S.: A run-time modulo scheduling by using a binary translation mechanism. In: SAMOS, pp. 75–82 (2014)
Govindaraju, V., Nowatzki, T., Sankaralingam, K.: Breaking SIMD shackles with an exposed flexible microarchitecture and the access execute PDG. In: PACT, pp. 341–352 (2013)
Liu, F., Ahn, H., Beard, S.R., Oh, T., August, D.I.: DynaSpAM: dynamic spatial architecture mapping using out of order instruction schedules. In: ISCA (2015)
Lysecky, R., Stitt, G., Vahid, F.: Warp processors. In: DAC, pp. 659–681 (2004)
Mei, B., Vernalde, S., Verkest, D., De Man, H., Lauwereins, R.: Exploiting loop-level parallelism on coarse-grained reconfigurable architectures using modulo scheduling. In: DATE, p. 10296 (2003)
Nair, R., Hopkins, M.E.: Exploiting instruction level parallelism in processors by caching scheduled groups. In: ISCA, pp. 13–25 (1997)
Park, Y., Park, J.J.K., Park, H., Mahlke, S.: Libra: tailoring SIMD execution using heterogeneous hardware and dynamic configurability. In: MICRO, pp. 84–95 (2012)
Patel, S.J., Lumetta, S.S.: rePLay: a hardware framework for dynamic optimization. IEEE Trans. Comput. 50(6), 590–608 (2001)
Paulino, N., Ferreira, J.a.C., Cardoso, J.a.M.P.: Improving performance and energy consumption in embedded systems via binary acceleration: a survey. ACM Comput. Surv. 53 (2020)
Pouchet, L.N.: Polybenchc-4.2.1 beta. https://github.com/MatthiasJReisinger/PolyBenchC-4.2.1
RISC-V Foundation: RISC-V GNU compiler toolchain. https://github.com/riscv/riscv-gnu-toolchain
RISC-V Foundation: The RISC-V Instruction Set Manual, Volume I: User-level ISA, 20190608-Base-Ratified. https://github.com/riscv/riscv-isa-manual/releases/download/Ratified-IMAFDQC/riscv-spec-20191213.pdf
Rokicki, S., Rohou, E., Derrien, S.: Hybrid-DBT: hardware/software dynamic binary translation targeting VLIW. IEEE Trans. Comput Aid. Des. Integr. Circuits Syst. 38(10), 1872–1885 (2019)
Uhrig, S., Shehan, B., Jahr, R., Ungerer, T.: A two-dimensional superscalar processor architecture. In: 2009 Computation World: Future Computing, Service Computation, Cognitive, Adaptive, Content, Patterns, pp. 608–611 (2009)
Watkins, M.A., Nowatzki, T., Carno, A.: Software transparent dynamic binary translation for coarse-grain reconfigurable architectures. In: HPCA, pp. 138–150 (2016)
Wegman, M.N., Zadeck, F.K.: Constant propagation with conditional branches. In: ACM Transactions on Programming Languages and Systems, pp. 291–299 (1985)
Wolf, D., Engel, A., Ruschke, T., Koch, A., Hochberger, C.: UltraSynth: insights of a CGRA integration into a control engineering environment. J. Sig. Process. Syst. 93, 463–479 (2021)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Wirsch, R., Hochberger, C. (2021). Towards Transparent Dynamic Binary Translation from RISC-V to a CGRA. In: Hochberger, C., Bauer, L., Pionteck, T. (eds) Architecture of Computing Systems. ARCS 2021. Lecture Notes in Computer Science(), vol 12800. Springer, Cham. https://doi.org/10.1007/978-3-030-81682-7_8
Download citation
DOI: https://doi.org/10.1007/978-3-030-81682-7_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-81681-0
Online ISBN: 978-3-030-81682-7
eBook Packages: Computer ScienceComputer Science (R0)