Journal of Signal Processing Systems

, Volume 80, Issue 1, pp 49–64

Code Density and Energy Efficiency of Exposed Datapath Architectures

  • Pekka Jääskeläinen
  • Heikki Kultala
  • Timo Viitanen
  • Jarmo Takala
Article

Abstract

Exposing details of the processor datapath to the programmer is motivated by improvements in the energy efficiency and the simplification of the microarchitecture. However, an instruction format that can control the data path in a more explicit manner requires more expressiveness when compared to an instruction format that implements more of the control logic in the processor hardware and presents conventional general purpose register based instructions to the programmer. That is, programs for exposed datapath processors might require additional instruction memory bits to be fetched, which consumes additional energy. With the interest in energy and power efficiency rising in the past decade, exposed datapath architectures have received renewed attention. Several variations of the additional details to expose to the programmer have been proposed in the academy, and some exposed datapath features have also appeared in commercial architectures. The different variations of proposed exposed datapath architectures and their effects to the energy efficiency, however, have not so far been analyzed in a systematic manner in public. This article provides a review of exposed datapath approaches and highlights their differences. In addition, a set of interesting exposed datapath design choices is evaluated in a closer study. Due to the fact that memories constitute a major component of power consumption in contemporary processors, we analyze instruction encodings for different exposed datapath variations and consider the energy required to fetch the additional instruction bits in comparison to the register file access savings achieved with the exposed datapath.

Keywords

Processor architectures Exposed datapath architectures Software bypassing Low power computing VLIW TTA 

References

  1. 1.
    Advanced Micro Devices, Inc.: Evergreen Family Instruction Set Architecture Instructions and Microcode Reference Guide (2011). Rev. 1.1a.Google Scholar
  2. 2.
    Balfour, J., Halting, R., Dally, W. (2009). Operand registers and explicit operand forwarding. Computer Architecture Letters, 8(2), 60–63. doi:10.1109/L-CA.2009.45.CrossRefGoogle Scholar
  3. 3.
    Bardizbanyan, A., Själander, M., Larsson-Edefors, P. (2011). Reconfigurable instruction decoding for a wide-control-word processor. In Proceedings IEEE international symposium on parallel and distributed processing (pp. 322–325).Google Scholar
  4. 4.
    Black-Schaffer, D., Balfour, J., Dally, W., Parikh, V., Park, J. (2008). Hierarchical instruction register organization. Computer Architecture Letters, 7 (2), 41–44. doi:10.1109/L-CA.2008.7.CrossRefGoogle Scholar
  5. 5.
    Cilio, A., Schot, H., Janssen, J. (2006). Architecture definition file: processor architecture definition file format for a new TTA design framework. Finland: Tampere University of Technology. http://tce.cs.tut.fi/specs/ADF.pdf.
  6. 6.
    Corporaal, H. (1993). Transport triggered architectures examined for general purpose applications. In Proceedings sixth workshop on computer systems (pp. 55–71). Delft.Google Scholar
  7. 7.
    Corporaal, H. (1997). Microprocessor architectures: from VLIW to TTA. Chichester: Wiley.Google Scholar
  8. 8.
    Corporaal, H., & Arend, P. (1993). MOVE32INT, a sea of gates realization of a high performance transport triggered architecture. Microprocessors and Microprogramming, 38, 53–60. doi:10.1016/0165-6074(93)90125-5.CrossRefGoogle Scholar
  9. 9.
    Corporaal, H., & Hoogerbrugge, J. (1995). Code generation for transport triggered architectures. In Code generation for embedded processors (pp. 240–259). Heidelberg: Springer-Verlag.Google Scholar
  10. 10.
    Corporaal, H., & Mulder, H. (1991). MOVE: A framework for high-performance processor design. In Proceedings ACM/IEEE conference supercomputing (pp. 692–701).Google Scholar
  11. 11.
    van Dalen, E., Pestana, S., van Wel, A. (2006). An integrated, low-power processor for image signal processing. IEEE International Symposium Multimedia (pp. 501–508).Google Scholar
  12. 12.
    Dally, W., Balfour, J., Black-Shaffer, D., Chen, J., Harting, R., Parikh, V., Park, J., Sheffield, D. (2008). Efficient embedded computing. Computer, 41 (7), 27–32. doi:10.1109/MC.2008.224.CrossRefGoogle Scholar
  13. 13.
    Finlayson, I., Davis, B., Gavin, P., Uh, G.R., Whalley, D., Själander, M., Tyson, G. (2013). Improving processor efficiency by statically pipelining instructions. In Proceedings ACM SIGPLAN/SIGBED conference languages compilers tools embedded system (pp. 33–44. ACM).Google Scholar
  14. 14.
    Fisher, J. (1995). Trace scheduling: a technique for global microcode compaction. IEEE Transactions on Computers, C-30 (7), 478–490. doi:10.1109/TC.1981.1675827.CrossRefGoogle Scholar
  15. 15.
    Cichon, G., Robelly, P., Seidel, H., Matúš, E., Bronzel, M., Fettweis, G. (2004). Synchronous transfer architecture (STA). In Computer systems: architectures, modeling, and simulation, lecture notes in computer science (vol. 3133, pp. 193–207). Berlin: Springer.Google Scholar
  16. 16.
    Godard, I. (2013). Drinking from the firehose: The Belt machine model in the MillTM CPU architectures. http://ootbcomp.com/docs/belt/index.html.
  17. 17.
    Guzma, V., Jääskeläinen, P., Kellomäki, P., Takala, J. (2008). Impact of software bypassing on instruction level parallelism and register file traffic. In Embedded computer systems: architectures, modeling, and simulation, Lecture notes in computer science (Vol. 5114, pp. 23–32). Heidelberg: Springer.Google Scholar
  18. 18.
    He, Y., She, D., Mesman, B., Corporaal, H. (2011). MOVE-Pro: A low power and high code density TTA architecture. In Proceedings international conference embedded computing system: arch. modeling simulation (pp. 294–301).Google Scholar
  19. 19.
    Heikkinen, J., Rantanen, T., Cilio, A., Takala, J., Corporaal, H. (2003). Evaluating template-based instruction compression on transport triggered architectures. In Proceedings IEEE international workshop. System-on-Chip for real-time applications (pp. 192–195).Google Scholar
  20. 20.
    Hoogerbrugge, J., & Corporaal, H. (1994). Register file port requirements of Transport Triggered Architectures. In Proceedings annual international symposium microarchitecture, (pp. 191–195).Google Scholar
  21. 21.
    Hughes, K., Jeppson, P., Larsson-Edefors, M., Sheeran, M., Stenström, P., Svensson, L. (2003). FlexSoC: combining flexbility and efficiency in SoC designs. In Proceedings IEEE NorChip conference.Google Scholar
  22. 22.
    Jääskeläinen, P., Guzma, V., Cilio, A., Takala, J. (2007). Codesign toolset for application-specific instruction-set processors. In Proceedings SPIE multimedia mobile devices (pp. 65,070X–1–65,070X–11).Google Scholar
  23. 23.
    Janssen, J., & Corporaal, H. (1996). Partitioned register file for TTAs. In Proceedings annal workshop microprogramming (pp. 303–312).Google Scholar
  24. 24.
    Keckler, S., Dally, W., Khailany, B., Garland, M., Glasco, D. (2011). GPUs and the future of parallel computing. IEEE Micro, 31 (5), 7–17. doi:10.1109/MM.2011.89.CrossRefGoogle Scholar
  25. 25.
    Klaiber, A. (2000). The technology behind crusoe processors: low-power x86-compatible processors implemented with code morphing software. Technical Report, Transmeta Corporation.Google Scholar
  26. 26.
    Leijten, J., Burns, G., Huisken, J., Waterlander, E., van Wel, A. (2003). AVISPA: A massively parallel reconfigurable accelerator. In Proceedings international symposium system-on-chip (pp. 165–168).Google Scholar
  27. 27.
    Lima: An open source graphics driver for ARM Mali GPUs (2013). http://limadriver.org.
  28. 28.
    Lipovski, G. (1976). The architecture of a simple, effective control processor. Euromicro symposium microprocessors microprogramming (pp. 7–19).Google Scholar
  29. 29.
    Lozano, L., & Gao, G. (1995). Exploiting short-lived variables in superscalar processors. In Proceedings annual international symposium microarchitecture. (pp. 292–302).Google Scholar
  30. 30.
    Maxim Integrated Products, Inc.: Introduction to the MAXQ Architecture (2004). Application Note 3222.Google Scholar
  31. 31.
    Reshadi, M., Gorjiara, B., Gajski, D. (2005). Utilizing horizontal and vertical parallelism with a no-instruction-set compiler for custom datapaths. In Proceedings international conference computer design, pp. 69–76.Google Scholar
  32. 32.
    Rosin, R. (1969). Contemporary concepts of microprogramming and emulation. ACM Computing Surveys, 1 (4), 197–212. doi:10.1145/356556.356559.CrossRefGoogle Scholar
  33. 33.
    Schilling, T., Själander, M., Larsson-Edefors, P. (2009). Scheduling for an embedded architecture with a flexible datapath. In IEEE computer society annual symposium on VLSI (pp. 151–156).Google Scholar
  34. 34.
    Smith, R. (1988). A historical overview of computer architecture. IEEE Annals of the History of Computing, 10 (4), 277–303. doi:10.1109/MAHC.1988.10039.CrossRefGoogle Scholar
  35. 35.
    Tabak, D., & Lipovski, G. (1980). MOVE architecture in digital controllers. IEEE Journal of Solid-State Circuits, 15 (1), 116–126. doi:10.1109/JSSC.1980.1051344.CrossRefGoogle Scholar
  36. 36.
    Thuresson, M., Själander, M., Bjork, M., Svensson, L., Larsson-Edefors, P., Stenström, P. (2007). FlexCore: Utilizing exposed datapath control for efficient computing. In Proceedings international conference embedded computer system: archives modeling simulation (pp. 18–25).Google Scholar
  37. 37.
    Touzeau, R. (1984). A Fortran compiler for the FPS-164 scientific computer. In Proceedings SIGPLAN symposium on compiler construction (pp. 48–57).Google Scholar
  38. 38.
    Vaughan-Nichols, S. (2004). Vendors go to extreme lengths for new chips. Computer, 37 (1), 18–20. doi:10.1109/MC.2004.1260714.CrossRefGoogle Scholar
  39. 39.
    Wilkes, M. (1969). The growth of interest in microprogramming: A literature survey. ACM Computing Surveys, 1 (3), 139–145. doi:10.1145/356551.356553.CrossRefMATHGoogle Scholar
  40. 40.
    Wilton, S.J., & Jouppi, N.P. (1996). CACTI: An enhanced cache access and cycle time model. IEEE Journal Solid-State Circulatory, 31 (5), 677–688. doi:10.1109/4.509850.CrossRefGoogle Scholar
  41. 41.
    Yan, J., & Zhang, W. (2007). Virtual registers: Reducing register pressure without enlarging the register file. In High performance embedded architectures and compilers, lecture notes in computer science (vol. 4367, pp. 57–70): Springer.Google Scholar

Copyright information

© Springer Science+Business Media New York 2014

Authors and Affiliations

  • Pekka Jääskeläinen
    • 1
  • Heikki Kultala
    • 1
  • Timo Viitanen
    • 1
  • Jarmo Takala
    • 1
  1. 1.Tampere University of TechnologyTampereFinland

Personalised recommendations