Instruction Fetch Energy Reduction with Biased SRAMs
Especially in programmable processors, energy consumption of integrated memories can become a limiting design factor due to thermal dissipation power constraints and limited battery capacity. Consequently, contemporary improvement efforts on memory technologies are focusing more on the energy-efficiency aspects, which has resulted in biased CMOS SRAM cells that increase energy efficiency by favoring one logical value over another. In this paper, xor-masking, a method for exploiting such contemporary low power SRAM memories is proposed to improve the energy-efficiency of instruction fetching. Xor-masking utilizes static program analysis statistics to produce optimal encoding masks to reduce the occurrence of the more energy consuming instruction bit values in the fetched instructions. The method is evaluated on LatticeMico32, a small RISC core popular in ultra low power designs, and on a wide instruction word high performance low power DSP. Compared to the previous “bus invert” technique typically used with similar SRAMs, the proposed method reduces instruction read energy consumption of the LatticeMico32 by up to 13% and 38% on the DSP core.
KeywordsAsymmetric SRAM Energy optimization Instruction fetch Low-power processors
The authors would like to thank the TUT Graduate School, Academy of Finland (project PLC), Finnish Funding Agency for Technology and Innovation (project ”Parallel Acceleration 3”, funding decision 1134/31/2015), and ARTEMIS JU under grant agreement no 621439 (ALMARVI).
- 2.Taylor, M. (2012). Is dark silicon useful?: harnessing the four horsemen of the coming dark silicon apocalypse. In Proceedings of the 49th annual design automation conference.Google Scholar
- 3.Bol, D., De Vos, J., Hocquet, C., Botman, F., Durvaux, F., Boyd, S., Flandre, D., Legat, J. (2013). SleepWalker: a 25-MHz 0.4-V Sub-mm2 7- μ m 2 μ W/MHz microcontroller in 65-nm LP/GP CMOS for low-carbon wireless sensor nodes. IEEE Journal of Solid-State Circuits, 48(1), 20–32.CrossRefGoogle Scholar
- 4.Carroll, A., & Heiser, G. (2010). An analysis of power consumption in a smartphone. In Proceedings of the USENIX annual technical conference. Boston.Google Scholar
- 6.Hu, J., Xue, C. J., Zhuge, Q., Tseng, W. C., Sha, E. H. -M. (2011). Towards energy efficient hybrid on-chip scratch pad memory with non-volatile memory. In Design, automation test in europe conference exhibition.Google Scholar
- 8.ISSCC. (2016). ISSCC 2016 tech trends. http://isscc.org.
- 9.Azizi, N., & Najm, F. N. (2004). An asymmetric SRAM cell to lower gate leakage. In Proceedings of the 5th international symposium on quality electronic design. Hangzhou.Google Scholar
- 10.Imani, M., Patil, S., Rosing, T. S. (2015). Hierarchical design of robust and low data dependent FinFET based SRAM array. In Proceedings of the international symposium on nanoscale architectures. Boston.Google Scholar
- 11.Mori, H., Nakagawa, T., Kitahara, Y., Kawamoto, Y., Takagi, K., Yoshimoto, S., Izumi, S., Nii, K., Kawaguchi, H., Yoshimoto, M. (2015). A 298-fJ/writecycle 650-fJ/readcycle 8T three-port SRAM in 28-nm FD-SOI process technology for image processor. In Proceedings of the IEEE custom integrated circuits conference. San Jose.Google Scholar
- 14.Multanen, J., Viitanen, T., Jääskeläinen, P., Takala, J. (2016). Xor-masking: a novel statistical method for instruction read energy reduction in contemporary SRAM technologies. In International workshop on signal processing systems. Dallas.Google Scholar
- 17.Ji, G., & Hui, G. (2009). A segmental bus-invert coding method for instruction memory data bus power efficiency. In Proceedings of the IEEE international symposium on circuits and systems. Taipei.Google Scholar
- 21.Benini, L., De Micheli, G., Macii, E., Poncino, M., Quez, S. (1997). System-level power optimization of special purpose applications: the beach solution. In Proceedings of the international symposium on low power electronics and design. Monterey.Google Scholar
- 24.Parhami, B. (1991). Design of m-out-of-n bit-voters. In Conference record of the twenty-fifth asilomar conference on signals, systems and computers (Vol. 2). Pacific Grove.Google Scholar
- 26.Lattice Semiconductor. Latticemico32 (2016). http://www.latticesemi.com/en/Products/DesignSoftwareAndIP/IntellectualProperty/IPCore/IPCores02/LatticeMico32.aspx.
- 27.Ben Salem, Z., Youssef, M. W., Abid, M. (2010). Prototyping cost-effective secure application server on a chip (sasoc) a case study for monitoring sensor network. In International conference on wireless and ubiquitous systems Sousse.Google Scholar
- 29.Multanen, J., Kultala, H., Koskela, M., Viitanen, T., Jääskeläinen, P., Takala, J., Danielyan, A., Cruz, C. (2016). Opencl programmable exposed datapath high performance low-power image signal processor. In IEEE Nordic circuits and systems conference.Google Scholar
- 30.Esko, O., Jääskeläinen, P., Huerta, P., de La Lama, C. S., Takala, J., Martinez, J. I. (2010). Customized exposed datapath soft-core design flow with compiler support. In Proceedings of international conference on field programmable logic and applications. Washington, DC.Google Scholar
- 31.Siti, M., & Fitz, M. P. (2006). A novel soft-output layered orthogonal lattice detector for multiple antenna communications. In International conference on communications (Vol. 4). Istanbul.Google Scholar
- 33.Zivojnovic, V., Martinez, J., Schlger, C., Meyr, H. (1994). DSPstone: a DSP-oriented benchmarking methodology. In Proceedings of the international conference on signal processing applications and technology. Dallas.Google Scholar
- 34.EEMBC –. (2016). The embedded microprocessor benchmark consortium. Coremark benchmark. http://www.eembc.org/coremark.
- 35.Wilhelm, R, Engblom, J, Ermedahl, A, Holsti, N, Thesing, S, Whalley, D, Bernat, G, Ferdinand, C, Heckmann, R, Mitra, T, Mueller, F, Puaut, I, Puschner, P, Staschulat, J, Stenström, P. (2008). The worst-case execution-time problem - overview of methods and survey of tools. ACM Transactions on Embedded Computing Systems, 7(3), 1–53.CrossRefGoogle Scholar