An energy efficient multi-target binary translator for instruction and data level parallelism exploitation

Knorst, Tiago; Vicenzi, Julio; Jordan, Michael G.; Almeida, Jonathan H. de; Korol, Guilherme; Beck, Antonio C. S.; Rutzig, Mateus B.

doi:10.1007/s10617-021-09258-6

An energy efficient multi-target binary translator for instruction and data level parallelism exploitation

Published: 14 January 2022

Volume 26, pages 55–82, (2022)
Cite this article

Design Automation for Embedded Systems Aims and scope Submit manuscript

Tiago Knorst¹,
Julio Vicenzi²,
Michael G. Jordan¹,
Jonathan H. de Almeida²,
Guilherme Korol¹,
Antonio C. S. Beck¹ &
…
Mateus B. Rutzig ORCID: orcid.org/0000-0002-2836-2009²

344 Accesses
2 Citations
Explore all metrics

Abstract

Embedded devices are omnipresent in our daily routine, from smartphones to home appliances, that run data and control-oriented applications. To maximize the energy-performance tradeoff, data and instruction-level parallelism are exploited by using superscalar and specific accelerators. However, as such devices have severe time-to-market, binary compatibility should be maintained to avoid recurrent engineering, which is not considered in current embedded processors. This work visited a set of embedded applications showing the need for concurrent ILP and DLP exploitation. For that, we propose a Hybrid Multi-Target Binary Translator (HMTBT) to transparently exploit ILP and DLP by using a CGRA and ARM NEON engine as targeted accelerators. Results show that HMTBT transparently achieves 24% performance improvements and 54% energy savings over an OoO superscalar processor coupled to an ARM NEON engine. The proposed approach improves performance and energy in 10%, 24% over decoupled binary translators using the same accelerator with the same ILP and DLP capabilities.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 11

A comprehensive review of Binary Neural Network

Article 30 March 2023

Chunyu Yuan & Sos S. Agaian

A Survey on Pipelined FFT Hardware Architectures

Article Open access 06 July 2021

Mario Garrido

A Hybrid Machine Learning Model for Code Optimization

Article 22 September 2023

Yacine Hakimi, Riyadh Baghdadi & Yacine Challal

References

Beck ACS, Carro L (2007) Transparent acceleration of data dependent instructions for general purpose processors. In: IFIP VLSI-SoC, pp 66–71
Beck ACS, Rutzig MB, Carro L (2014) A transparent and adaptive reconfigurable system. Microprocess Microsyst 38(5):509–524. https://doi.org/10.1016/j.micpro.2014.03.004. https://www.sciencedirect.com/science/article/pii/S0141933114000313
Beck ACS., Rutzig MB, Gaydadjiev G, Carro L (2008) Transparent reconfigurable acceleration for heterogeneous embedded applications. In: 2008 Design, automation and test in Europe, pp 1208–1213. IEEE
Brandalero M, Beck ACS (2017) A mechanism for energy-efficient reuse of decoding and scheduling of x86 instruction streams. In: Design, automation & test in Europe conference & exhibition (DATE), 2017, pp 1468–1473. IEEE
Clark N, Kudlur M, Park H, Mahlke S, Flautner K (2004) Application-specific processing on a general-purpose core via transparent instruction set customization. In: 37th International symposium on microarchitecture (MICRO-37’04), pp 30–40. IEEE
DeVuyst M, Venkat A, Tullsen DM (2012) Execution migration in a heterogeneous-isa chip multiprocessor. In: ASPLOS, pp 261–272
Fajardo J Jr, Rutzig MB, Carro L, Beck AC (2013) Towards a multiple-isa embedded system. J Syst Architect 59(2):103–119
Article Google Scholar
Fu SY, Hong DY, Liu YP, Wu JJ, Hsu WC (2018) Efficient and retargetable SIMD translation in a dynamic binary translator. Softw Pract Exp 48(6):1312–1330
Article Google Scholar
Georgakoudis G, Nikolopoulos DS, Vandierendonck H, Lalis S (2014) Fast dynamic binary rewriting for flexible thread migration on shared-isa heterogeneous mpsocs. In: SAMOS XIV, pp 156–163. IEEE
Govindaraju V, Ho CH, Nowatzki T, Chhugani J, Satish N, Sankaralingam K, Kim C (2012) DySER: unifying functionality and parallelism specialization for energy-efficient computing. IEEE Micro 32(5):38–51. https://doi.org/10.1109/MM.2012.51. http://ieeexplore.ieee.org/document/6235947/
Jordan MG, Knorst T, Vicenzi J, Rutzig MB (2019) Boosting simd benefits through a run-time and energy efficient dlp detection. In: 2019 Design, automation & test in Europe conference & exhibition (DATE), pp 722–727. IEEE. https://doi.org/10.23919/DATE.2019.8714826
Junior JF, Rutzig MB, Carro L, Beck AC (2011) A transparent and adaptable multiple-isa embedded system. In: Proceedings of the international conference on engineering of reconfigurable systems and algorithms (ERSA), p 1. The steering committee of the world congress in computer science, computer
Korol G, Jordan MG, Brandalero M, Hübner M, Beck Rutzig M, Schneider Beck AC (2020) MCEA: A resource-aware multicore CGRA architecture for the edge. In: 2020 30th International conference on field-programmable logic and applications (FPL), pp 33–39. https://doi.org/10.1109/FPL50879.2020.00017. ISSN: 1946-1488
Martins MGA, Matos JM, Ribas RP, Reis AI, Schlinker G, Rech L, Michelsen J (2015) Open cell library in 15 nm freepdk technology. In: ISPD, pp 171–178
Nakamura T, Miki S, Oikawa S (2011) Automatic vectorization by runtime binary translation. In: 2011 second international conference on networking and computing, pp 87–94
Nuzman D, Zaks A (2008) Outer-loop vectorization—revisited for short SIMD architectures. In: 2008 International conference on parallel architectures and compilation techniques (PACT), pp 2–11
Park S, Wu Y, Lee J, Aupov A, Mahlke S (2019) Multi-objective exploration for practical optimization decisions in binary translation. ACM Trans Embed Comput Syst 18(5s):1–19
Article Google Scholar
Podobas A, Sano K, Matsuoka S (2020) A survey on coarse-grained reconfigurable architectures from a performance perspective. arXiv preprint arXiv:2004.04509
Rokicki S, Rohou E, Derrien S (2019) Hybrid-dbt: Hardware/software dynamic binary translation targeting vliw. IEEE Trans Comput Aided Des Integr Circuits Syst 38(10):1872–1885. https://doi.org/10.1109/TCAD.2018.2864288
Article Google Scholar
Rutzig MB, Beck ACS, Carro L (2013) A transparent and energy aware reconfigurable multiprocessor platform for simultaneous ILP and TLP exploitation. In: 2013 Design, automation test in europe conference exhibition (DATE), pp 1559–1564. https://doi.org/10.7873/DATE.2013.317. ISSN: 1530-1591
Vahid F, Stitt G, Lysecky R (2008) Warp processing: dynamic translation of binaries to fpga circuits. Computer 41(7):40–46
Article Google Scholar
Watkins MA, Nowatzki T, Carno A (2016) Software transparent dynamic binary translation for coarse-grain reconfigurable architectures. In: 2016 IEEE International symposium on high performance computer architecture (HPCA), pp 138–150. IEEE, Barcelona, Spain. https://doi.org/10.1109/HPCA.2016.7446060. http://ieeexplore.ieee.org/document/7446060/
Zhou R, Wort G, Erdös M, Jones TM (2019) The janus triad: exploiting parallelism through dynamic binary modification. In: Proceedings of the 15th ACM SIGPLAN/SIGOPS international conference on virtual execution environments–VEE 2019, pp 88–100. ACM Press. https://doi.org/10.1145/3313808.3313812. http://dl.acm.org/citation.cfm?doid=3313808.3313812

Download references

Author information

Authors and Affiliations

Institute of Informatics, Universidade Federal do Rio Grande do Sul (UFRGS), Porto Alegre, Brazil
Tiago Knorst, Michael G. Jordan, Guilherme Korol & Antonio C. S. Beck
Electronics and Computing Department, Universidade Federal de Santa Maria (UFSM), Santa Maria, Brazil
Julio Vicenzi, Jonathan H. de Almeida & Mateus B. Rutzig

Authors

Tiago Knorst
View author publications
You can also search for this author in PubMed Google Scholar
Julio Vicenzi
View author publications
You can also search for this author in PubMed Google Scholar
Michael G. Jordan
View author publications
You can also search for this author in PubMed Google Scholar
Jonathan H. de Almeida
View author publications
You can also search for this author in PubMed Google Scholar
Guilherme Korol
View author publications
You can also search for this author in PubMed Google Scholar
Antonio C. S. Beck
View author publications
You can also search for this author in PubMed Google Scholar
Mateus B. Rutzig
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mateus B. Rutzig.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This study was financed in part by: CNPq; FAPERGS/CNPq 11/2014 - PRONEM; and the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - Brasil (CAPES) - Finance Code 001.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Knorst, T., Vicenzi, J., Jordan, M.G. et al. An energy efficient multi-target binary translator for instruction and data level parallelism exploitation. Des Autom Embed Syst 26, 55–82 (2022). https://doi.org/10.1007/s10617-021-09258-6

Download citation

Received: 14 October 2020
Accepted: 13 November 2021
Published: 14 January 2022
Issue Date: March 2022
DOI: https://doi.org/10.1007/s10617-021-09258-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An energy efficient multi-target binary translator for instruction and data level parallelism exploitation

Abstract

Access this article

Similar content being viewed by others

A comprehensive review of Binary Neural Network

A Survey on Pipelined FFT Hardware Architectures

A Hybrid Machine Learning Model for Code Optimization

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

An energy efficient multi-target binary translator for instruction and data level parallelism exploitation

Abstract

Access this article

Similar content being viewed by others

A comprehensive review of Binary Neural Network

A Survey on Pipelined FFT Hardware Architectures

A Hybrid Machine Learning Model for Code Optimization

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation