Code Density and Energy Efficiency of Exposed Datapath Architectures
- 354 Downloads
Exposing details of the processor datapath to the programmer is motivated by improvements in the energy efficiency and the simplification of the microarchitecture. However, an instruction format that can control the data path in a more explicit manner requires more expressiveness when compared to an instruction format that implements more of the control logic in the processor hardware and presents conventional general purpose register based instructions to the programmer. That is, programs for exposed datapath processors might require additional instruction memory bits to be fetched, which consumes additional energy. With the interest in energy and power efficiency rising in the past decade, exposed datapath architectures have received renewed attention. Several variations of the additional details to expose to the programmer have been proposed in the academy, and some exposed datapath features have also appeared in commercial architectures. The different variations of proposed exposed datapath architectures and their effects to the energy efficiency, however, have not so far been analyzed in a systematic manner in public. This article provides a review of exposed datapath approaches and highlights their differences. In addition, a set of interesting exposed datapath design choices is evaluated in a closer study. Due to the fact that memories constitute a major component of power consumption in contemporary processors, we analyze instruction encodings for different exposed datapath variations and consider the energy required to fetch the additional instruction bits in comparison to the register file access savings achieved with the exposed datapath.
KeywordsProcessor architectures Exposed datapath architectures Software bypassing Low power computing VLIW TTA
This work was funded by Academy of Finland (funding decision 253087), Finnish Funding Agency for Technology and Innovation (project ”Parallel Acceleration”, funding decision 40115/13), and ARTEMIS Joint Undertaking under grant agreement no 641439 (ALMARVI).
- 1.Advanced Micro Devices, Inc.: Evergreen Family Instruction Set Architecture Instructions and Microcode Reference Guide (2011). Rev. 1.1a.Google Scholar
- 3.Bardizbanyan, A., Själander, M., Larsson-Edefors, P. (2011). Reconfigurable instruction decoding for a wide-control-word processor. In Proceedings IEEE international symposium on parallel and distributed processing (pp. 322–325).Google Scholar
- 5.Cilio, A., Schot, H., Janssen, J. (2006). Architecture definition file: processor architecture definition file format for a new TTA design framework. Finland: Tampere University of Technology. http://tce.cs.tut.fi/specs/ADF.pdf.
- 6.Corporaal, H. (1993). Transport triggered architectures examined for general purpose applications. In Proceedings sixth workshop on computer systems (pp. 55–71). Delft.Google Scholar
- 7.Corporaal, H. (1997). Microprocessor architectures: from VLIW to TTA. Chichester: Wiley.Google Scholar
- 9.Corporaal, H., & Hoogerbrugge, J. (1995). Code generation for transport triggered architectures. In Code generation for embedded processors (pp. 240–259). Heidelberg: Springer-Verlag.Google Scholar
- 10.Corporaal, H., & Mulder, H. (1991). MOVE: A framework for high-performance processor design. In Proceedings ACM/IEEE conference supercomputing (pp. 692–701).Google Scholar
- 11.van Dalen, E., Pestana, S., van Wel, A. (2006). An integrated, low-power processor for image signal processing. IEEE International Symposium Multimedia (pp. 501–508).Google Scholar
- 13.Finlayson, I., Davis, B., Gavin, P., Uh, G.R., Whalley, D., Själander, M., Tyson, G. (2013). Improving processor efficiency by statically pipelining instructions. In Proceedings ACM SIGPLAN/SIGBED conference languages compilers tools embedded system (pp. 33–44. ACM).Google Scholar
- 15.Cichon, G., Robelly, P., Seidel, H., Matúš, E., Bronzel, M., Fettweis, G. (2004). Synchronous transfer architecture (STA). In Computer systems: architectures, modeling, and simulation, lecture notes in computer science (vol. 3133, pp. 193–207). Berlin: Springer.Google Scholar
- 16.Godard, I. (2013). Drinking from the firehose: The Belt machine model in the MillTM CPU architectures. http://ootbcomp.com/docs/belt/index.html.
- 17.Guzma, V., Jääskeläinen, P., Kellomäki, P., Takala, J. (2008). Impact of software bypassing on instruction level parallelism and register file traffic. In Embedded computer systems: architectures, modeling, and simulation, Lecture notes in computer science (Vol. 5114, pp. 23–32). Heidelberg: Springer.Google Scholar
- 18.He, Y., She, D., Mesman, B., Corporaal, H. (2011). MOVE-Pro: A low power and high code density TTA architecture. In Proceedings international conference embedded computing system: arch. modeling simulation (pp. 294–301).Google Scholar
- 19.Heikkinen, J., Rantanen, T., Cilio, A., Takala, J., Corporaal, H. (2003). Evaluating template-based instruction compression on transport triggered architectures. In Proceedings IEEE international workshop. System-on-Chip for real-time applications (pp. 192–195).Google Scholar
- 20.Hoogerbrugge, J., & Corporaal, H. (1994). Register file port requirements of Transport Triggered Architectures. In Proceedings annual international symposium microarchitecture, (pp. 191–195).Google Scholar
- 21.Hughes, K., Jeppson, P., Larsson-Edefors, M., Sheeran, M., Stenström, P., Svensson, L. (2003). FlexSoC: combining flexbility and efficiency in SoC designs. In Proceedings IEEE NorChip conference.Google Scholar
- 22.Jääskeläinen, P., Guzma, V., Cilio, A., Takala, J. (2007). Codesign toolset for application-specific instruction-set processors. In Proceedings SPIE multimedia mobile devices (pp. 65,070X–1–65,070X–11).Google Scholar
- 23.Janssen, J., & Corporaal, H. (1996). Partitioned register file for TTAs. In Proceedings annal workshop microprogramming (pp. 303–312).Google Scholar
- 25.Klaiber, A. (2000). The technology behind crusoe processors: low-power x86-compatible processors implemented with code morphing software. Technical Report, Transmeta Corporation.Google Scholar
- 26.Leijten, J., Burns, G., Huisken, J., Waterlander, E., van Wel, A. (2003). AVISPA: A massively parallel reconfigurable accelerator. In Proceedings international symposium system-on-chip (pp. 165–168).Google Scholar
- 27.Lima: An open source graphics driver for ARM Mali GPUs (2013). http://limadriver.org.
- 28.Lipovski, G. (1976). The architecture of a simple, effective control processor. Euromicro symposium microprocessors microprogramming (pp. 7–19).Google Scholar
- 29.Lozano, L., & Gao, G. (1995). Exploiting short-lived variables in superscalar processors. In Proceedings annual international symposium microarchitecture. (pp. 292–302).Google Scholar
- 30.Maxim Integrated Products, Inc.: Introduction to the MAXQ Architecture (2004). Application Note 3222.Google Scholar
- 31.Reshadi, M., Gorjiara, B., Gajski, D. (2005). Utilizing horizontal and vertical parallelism with a no-instruction-set compiler for custom datapaths. In Proceedings international conference computer design, pp. 69–76.Google Scholar
- 33.Schilling, T., Själander, M., Larsson-Edefors, P. (2009). Scheduling for an embedded architecture with a flexible datapath. In IEEE computer society annual symposium on VLSI (pp. 151–156).Google Scholar
- 36.Thuresson, M., Själander, M., Bjork, M., Svensson, L., Larsson-Edefors, P., Stenström, P. (2007). FlexCore: Utilizing exposed datapath control for efficient computing. In Proceedings international conference embedded computer system: archives modeling simulation (pp. 18–25).Google Scholar
- 37.Touzeau, R. (1984). A Fortran compiler for the FPS-164 scientific computer. In Proceedings SIGPLAN symposium on compiler construction (pp. 48–57).Google Scholar
- 41.Yan, J., & Zhang, W. (2007). Virtual registers: Reducing register pressure without enlarging the register file. In High performance embedded architectures and compilers, lecture notes in computer science (vol. 4367, pp. 57–70): Springer.Google Scholar