Code Density and Energy Efficiency of Exposed Datapath Architectures

Abstract

Exposing details of the processor datapath to the programmer is motivated by improvements in the energy efficiency and the simplification of the microarchitecture. However, an instruction format that can control the data path in a more explicit manner requires more expressiveness when compared to an instruction format that implements more of the control logic in the processor hardware and presents conventional general purpose register based instructions to the programmer. That is, programs for exposed datapath processors might require additional instruction memory bits to be fetched, which consumes additional energy. With the interest in energy and power efficiency rising in the past decade, exposed datapath architectures have received renewed attention. Several variations of the additional details to expose to the programmer have been proposed in the academy, and some exposed datapath features have also appeared in commercial architectures. The different variations of proposed exposed datapath architectures and their effects to the energy efficiency, however, have not so far been analyzed in a systematic manner in public. This article provides a review of exposed datapath approaches and highlights their differences. In addition, a set of interesting exposed datapath design choices is evaluated in a closer study. Due to the fact that memories constitute a major component of power consumption in contemporary processors, we analyze instruction encodings for different exposed datapath variations and consider the energy required to fetch the additional instruction bits in comparison to the register file access savings achieved with the exposed datapath.

This is a preview of subscription content, access via your institution.

Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9
Figure 10
Figure 11
Figure 12
Figure 13
Figure 14

References

  1. 1.

    Advanced Micro Devices, Inc.: Evergreen Family Instruction Set Architecture Instructions and Microcode Reference Guide (2011). Rev. 1.1a.

  2. 2.

    Balfour, J., Halting, R., Dally, W. (2009). Operand registers and explicit operand forwarding. Computer Architecture Letters, 8(2), 60–63. doi:10.1109/L-CA.2009.45.

    Article  Google Scholar 

  3. 3.

    Bardizbanyan, A., Själander, M., Larsson-Edefors, P. (2011). Reconfigurable instruction decoding for a wide-control-word processor. In Proceedings IEEE international symposium on parallel and distributed processing (pp. 322–325).

  4. 4.

    Black-Schaffer, D., Balfour, J., Dally, W., Parikh, V., Park, J. (2008). Hierarchical instruction register organization. Computer Architecture Letters, 7 (2), 41–44. doi:10.1109/L-CA.2008.7.

    Article  Google Scholar 

  5. 5.

    Cilio, A., Schot, H., Janssen, J. (2006). Architecture definition file: processor architecture definition file format for a new TTA design framework. Finland: Tampere University of Technology. http://tce.cs.tut.fi/specs/ADF.pdf.

  6. 6.

    Corporaal, H. (1993). Transport triggered architectures examined for general purpose applications. In Proceedings sixth workshop on computer systems (pp. 55–71). Delft.

  7. 7.

    Corporaal, H. (1997). Microprocessor architectures: from VLIW to TTA. Chichester: Wiley.

  8. 8.

    Corporaal, H., & Arend, P. (1993). MOVE32INT, a sea of gates realization of a high performance transport triggered architecture. Microprocessors and Microprogramming, 38, 53–60. doi:10.1016/0165-6074(93)90125-5.

    Article  Google Scholar 

  9. 9.

    Corporaal, H., & Hoogerbrugge, J. (1995). Code generation for transport triggered architectures. In Code generation for embedded processors (pp. 240–259). Heidelberg: Springer-Verlag.

  10. 10.

    Corporaal, H., & Mulder, H. (1991). MOVE: A framework for high-performance processor design. In Proceedings ACM/IEEE conference supercomputing (pp. 692–701).

  11. 11.

    van Dalen, E., Pestana, S., van Wel, A. (2006). An integrated, low-power processor for image signal processing. IEEE International Symposium Multimedia (pp. 501–508).

  12. 12.

    Dally, W., Balfour, J., Black-Shaffer, D., Chen, J., Harting, R., Parikh, V., Park, J., Sheffield, D. (2008). Efficient embedded computing. Computer, 41 (7), 27–32. doi:10.1109/MC.2008.224.

    Article  Google Scholar 

  13. 13.

    Finlayson, I., Davis, B., Gavin, P., Uh, G.R., Whalley, D., Själander, M., Tyson, G. (2013). Improving processor efficiency by statically pipelining instructions. In Proceedings ACM SIGPLAN/SIGBED conference languages compilers tools embedded system (pp. 33–44. ACM).

  14. 14.

    Fisher, J. (1995). Trace scheduling: a technique for global microcode compaction. IEEE Transactions on Computers, C-30 (7), 478–490. doi:10.1109/TC.1981.1675827.

    Article  Google Scholar 

  15. 15.

    Cichon, G., Robelly, P., Seidel, H., Matúš, E., Bronzel, M., Fettweis, G. (2004). Synchronous transfer architecture (STA). In Computer systems: architectures, modeling, and simulation, lecture notes in computer science (vol. 3133, pp. 193–207). Berlin: Springer.

  16. 16.

    Godard, I. (2013). Drinking from the firehose: The Belt machine model in the MillTM CPU architectures. http://ootbcomp.com/docs/belt/index.html.

  17. 17.

    Guzma, V., Jääskeläinen, P., Kellomäki, P., Takala, J. (2008). Impact of software bypassing on instruction level parallelism and register file traffic. In Embedded computer systems: architectures, modeling, and simulation, Lecture notes in computer science (Vol. 5114, pp. 23–32). Heidelberg: Springer.

  18. 18.

    He, Y., She, D., Mesman, B., Corporaal, H. (2011). MOVE-Pro: A low power and high code density TTA architecture. In Proceedings international conference embedded computing system: arch. modeling simulation (pp. 294–301).

  19. 19.

    Heikkinen, J., Rantanen, T., Cilio, A., Takala, J., Corporaal, H. (2003). Evaluating template-based instruction compression on transport triggered architectures. In Proceedings IEEE international workshop. System-on-Chip for real-time applications (pp. 192–195).

  20. 20.

    Hoogerbrugge, J., & Corporaal, H. (1994). Register file port requirements of Transport Triggered Architectures. In Proceedings annual international symposium microarchitecture, (pp. 191–195).

  21. 21.

    Hughes, K., Jeppson, P., Larsson-Edefors, M., Sheeran, M., Stenström, P., Svensson, L. (2003). FlexSoC: combining flexbility and efficiency in SoC designs. In Proceedings IEEE NorChip conference.

  22. 22.

    Jääskeläinen, P., Guzma, V., Cilio, A., Takala, J. (2007). Codesign toolset for application-specific instruction-set processors. In Proceedings SPIE multimedia mobile devices (pp. 65,070X–1–65,070X–11).

  23. 23.

    Janssen, J., & Corporaal, H. (1996). Partitioned register file for TTAs. In Proceedings annal workshop microprogramming (pp. 303–312).

  24. 24.

    Keckler, S., Dally, W., Khailany, B., Garland, M., Glasco, D. (2011). GPUs and the future of parallel computing. IEEE Micro, 31 (5), 7–17. doi:10.1109/MM.2011.89.

    Article  Google Scholar 

  25. 25.

    Klaiber, A. (2000). The technology behind crusoe processors: low-power x86-compatible processors implemented with code morphing software. Technical Report, Transmeta Corporation.

  26. 26.

    Leijten, J., Burns, G., Huisken, J., Waterlander, E., van Wel, A. (2003). AVISPA: A massively parallel reconfigurable accelerator. In Proceedings international symposium system-on-chip (pp. 165–168).

  27. 27.

    Lima: An open source graphics driver for ARM Mali GPUs (2013). http://limadriver.org.

  28. 28.

    Lipovski, G. (1976). The architecture of a simple, effective control processor. Euromicro symposium microprocessors microprogramming (pp. 7–19).

  29. 29.

    Lozano, L., & Gao, G. (1995). Exploiting short-lived variables in superscalar processors. In Proceedings annual international symposium microarchitecture. (pp. 292–302).

  30. 30.

    Maxim Integrated Products, Inc.: Introduction to the MAXQ Architecture (2004). Application Note 3222.

  31. 31.

    Reshadi, M., Gorjiara, B., Gajski, D. (2005). Utilizing horizontal and vertical parallelism with a no-instruction-set compiler for custom datapaths. In Proceedings international conference computer design, pp. 69–76.

  32. 32.

    Rosin, R. (1969). Contemporary concepts of microprogramming and emulation. ACM Computing Surveys, 1 (4), 197–212. doi:10.1145/356556.356559.

    Article  Google Scholar 

  33. 33.

    Schilling, T., Själander, M., Larsson-Edefors, P. (2009). Scheduling for an embedded architecture with a flexible datapath. In IEEE computer society annual symposium on VLSI (pp. 151–156).

  34. 34.

    Smith, R. (1988). A historical overview of computer architecture. IEEE Annals of the History of Computing, 10 (4), 277–303. doi:10.1109/MAHC.1988.10039.

    Article  Google Scholar 

  35. 35.

    Tabak, D., & Lipovski, G. (1980). MOVE architecture in digital controllers. IEEE Journal of Solid-State Circuits, 15 (1), 116–126. doi:10.1109/JSSC.1980.1051344.

    Article  Google Scholar 

  36. 36.

    Thuresson, M., Själander, M., Bjork, M., Svensson, L., Larsson-Edefors, P., Stenström, P. (2007). FlexCore: Utilizing exposed datapath control for efficient computing. In Proceedings international conference embedded computer system: archives modeling simulation (pp. 18–25).

  37. 37.

    Touzeau, R. (1984). A Fortran compiler for the FPS-164 scientific computer. In Proceedings SIGPLAN symposium on compiler construction (pp. 48–57).

  38. 38.

    Vaughan-Nichols, S. (2004). Vendors go to extreme lengths for new chips. Computer, 37 (1), 18–20. doi:10.1109/MC.2004.1260714.

    Article  Google Scholar 

  39. 39.

    Wilkes, M. (1969). The growth of interest in microprogramming: A literature survey. ACM Computing Surveys, 1 (3), 139–145. doi:10.1145/356551.356553.

    Article  MATH  Google Scholar 

  40. 40.

    Wilton, S.J., & Jouppi, N.P. (1996). CACTI: An enhanced cache access and cycle time model. IEEE Journal Solid-State Circulatory, 31 (5), 677–688. doi:10.1109/4.509850.

    Article  Google Scholar 

  41. 41.

    Yan, J., & Zhang, W. (2007). Virtual registers: Reducing register pressure without enlarging the register file. In High performance embedded architectures and compilers, lecture notes in computer science (vol. 4367, pp. 57–70): Springer.

Download references

Acknowledgments

This work was funded by Academy of Finland (funding decision 253087), Finnish Funding Agency for Technology and Innovation (project ”Parallel Acceleration”, funding decision 40115/13), and ARTEMIS Joint Undertaking under grant agreement no 641439 (ALMARVI).

Author information

Affiliations

Authors

Corresponding author

Correspondence to Pekka Jääskeläinen.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Jääskeläinen, P., Kultala, H., Viitanen, T. et al. Code Density and Energy Efficiency of Exposed Datapath Architectures. J Sign Process Syst 80, 49–64 (2015). https://doi.org/10.1007/s11265-014-0924-x

Download citation

Keywords

  • Processor architectures
  • Exposed datapath architectures
  • Software bypassing
  • Low power computing
  • VLIW
  • TTA