Journal of Signal Processing Systems

, Volume 90, Issue 11, pp 1519–1532 | Cite as

Instruction Fetch Energy Reduction with Biased SRAMs

  • Joonas MultanenEmail author
  • Timo Viitanen
  • Pekka Jääskeläinen
  • Jarmo Takala


Especially in programmable processors, energy consumption of integrated memories can become a limiting design factor due to thermal dissipation power constraints and limited battery capacity. Consequently, contemporary improvement efforts on memory technologies are focusing more on the energy-efficiency aspects, which has resulted in biased CMOS SRAM cells that increase energy efficiency by favoring one logical value over another. In this paper, xor-masking, a method for exploiting such contemporary low power SRAM memories is proposed to improve the energy-efficiency of instruction fetching. Xor-masking utilizes static program analysis statistics to produce optimal encoding masks to reduce the occurrence of the more energy consuming instruction bit values in the fetched instructions. The method is evaluated on LatticeMico32, a small RISC core popular in ultra low power designs, and on a wide instruction word high performance low power DSP. Compared to the previous “bus invert” technique typically used with similar SRAMs, the proposed method reduces instruction read energy consumption of the LatticeMico32 by up to 13% and 38% on the DSP core.


Asymmetric SRAM Energy optimization Instruction fetch Low-power processors 



The authors would like to thank the TUT Graduate School, Academy of Finland (project PLC), Finnish Funding Agency for Technology and Innovation (project ”Parallel Acceleration 3”, funding decision 1134/31/2015), and ARTEMIS JU under grant agreement no 621439 (ALMARVI).


  1. 1.
    Atzori, L, Iera, A, Morabito, G. (2010). The internet of things: a survey. Computer Networks, 54(15), 2787–2805.CrossRefGoogle Scholar
  2. 2.
    Taylor, M. (2012). Is dark silicon useful?: harnessing the four horsemen of the coming dark silicon apocalypse. In Proceedings of the 49th annual design automation conference.Google Scholar
  3. 3.
    Bol, D., De Vos, J., Hocquet, C., Botman, F., Durvaux, F., Boyd, S., Flandre, D., Legat, J. (2013). SleepWalker: a 25-MHz 0.4-V Sub-mm2 7- μ m 2 μ W/MHz microcontroller in 65-nm LP/GP CMOS for low-carbon wireless sensor nodes. IEEE Journal of Solid-State Circuits, 48(1), 20–32.CrossRefGoogle Scholar
  4. 4.
    Carroll, A., & Heiser, G. (2010). An analysis of power consumption in a smartphone. In Proceedings of the USENIX annual technical conference. Boston.Google Scholar
  5. 5.
    Fong, X., Kim, Y., Yogendra, K., Fan, D., Sengupta, A., Raghunathan, A., Roy, K. (2016). Spin-transfer torque devices for logic and memory: prospects and perspectives. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 35(1), 1–22.CrossRefGoogle Scholar
  6. 6.
    Hu, J., Xue, C. J., Zhuge, Q., Tseng, W. C., Sha, E. H. -M. (2011). Towards energy efficient hybrid on-chip scratch pad memory with non-volatile memory. In Design, automation test in europe conference exhibition.Google Scholar
  7. 7.
    Benini, L., Macii, A., Poncino, M. (2003). Energy-aware design of embedded memories: a survey of technologies, architectures, and optimization techniques. Transactions on Embedded Computing Systems, 2(1), 5–32.CrossRefGoogle Scholar
  8. 8.
    ISSCC. (2016). ISSCC 2016 tech trends.
  9. 9.
    Azizi, N., & Najm, F. N. (2004). An asymmetric SRAM cell to lower gate leakage. In Proceedings of the 5th international symposium on quality electronic design. Hangzhou.Google Scholar
  10. 10.
    Imani, M., Patil, S., Rosing, T. S. (2015). Hierarchical design of robust and low data dependent FinFET based SRAM array. In Proceedings of the international symposium on nanoscale architectures. Boston.Google Scholar
  11. 11.
    Mori, H., Nakagawa, T., Kitahara, Y., Kawamoto, Y., Takagi, K., Yoshimoto, S., Izumi, S., Nii, K., Kawaguchi, H., Yoshimoto, M. (2015). A 298-fJ/writecycle 650-fJ/readcycle 8T three-port SRAM in 28-nm FD-SOI process technology for image processor. In Proceedings of the IEEE custom integrated circuits conference. San Jose.Google Scholar
  12. 12.
    Teman, A., Mordakhay, A., Mezhibovsky, J., Fish, A. (2012). A 40-nm sub-threshold 5T SRAM bit cell with improved read and write stability. IEEE Transactions on Circuits and Systems II: Express Briefs, 59(12), 873–877.CrossRefGoogle Scholar
  13. 13.
    Young, K. K. (1989). Short-channel effect in fully depleted soi mosfets. IEEE Transactions on Electron Devices, 36(2), 399–402.CrossRefGoogle Scholar
  14. 14.
    Multanen, J., Viitanen, T., Jääskeläinen, P., Takala, J. (2016). Xor-masking: a novel statistical method for instruction read energy reduction in contemporary SRAM technologies. In International workshop on signal processing systems. Dallas.Google Scholar
  15. 15.
    Stan, M. R., & Burleson, W. P. (1995). Bus-invert coding for low-power I/O. IEEE Transactions on Very Large Scale Integration Systems, 3(1), 49–58.CrossRefGoogle Scholar
  16. 16.
    Shin, Y., Chae, S. -I., Choi, K. (2001). Partial bus-invert coding for power optimization of application-specific systems. IEEE Transactions on Very Large Scale Integration Systems, 9(2), 377–383.CrossRefGoogle Scholar
  17. 17.
    Ji, G., & Hui, G. (2009). A segmental bus-invert coding method for instruction memory data bus power efficiency. In Proceedings of the IEEE international symposium on circuits and systems. Taipei.Google Scholar
  18. 18.
    Petrov, P., & Orailoglu, A. (2003). Application-specific instruction memory customizations for power-efficient embedded processors. IEEE Design Test of Computers, 20(1), 18–25.CrossRefGoogle Scholar
  19. 19.
    Su, C., Tsui, C., Despain, A. (1994). Saving power in the control path of embedded processors. IEEE Design and Test of Computers, 11(4), 24–31.CrossRefGoogle Scholar
  20. 20.
    Musoll, E., Lang, T., Cortadella, J. (1998). Working-zone encoding for reducing the energy in microprocessor address buses. IEEE Transactions on Very Large Scale Integration Systems, 6(4).CrossRefGoogle Scholar
  21. 21.
    Benini, L., De Micheli, G., Macii, E., Poncino, M., Quez, S. (1997). System-level power optimization of special purpose applications: the beach solution. In Proceedings of the international symposium on low power electronics and design. Monterey.Google Scholar
  22. 22.
    Yang, J., Gupta, R., Zhang, C. (2004). Frequent value encoding for low power data buses. ACM Transactions on Design Automation of Electronic Systems, 9(3), 354–384.CrossRefGoogle Scholar
  23. 23.
    Hennessy, J., & Patterson, D. (2002). Computer architecture: a quantitative approach, 3rd edn. San Francisco: Morgan Kaufmann Publishers Inc.,.zbMATHGoogle Scholar
  24. 24.
    Parhami, B. (1991). Design of m-out-of-n bit-voters. In Conference record of the twenty-fifth asilomar conference on signals, systems and computers (Vol. 2). Pacific Grove.Google Scholar
  25. 25.
    Suresh, D. C., Najjar, W. A., Vahid, F., Villarreal, J. R., Stitt, G. (2003). Profiling tools for hardware/software partitioning of embedded applications. SIGPLAN Notices, 38(7), 189–198.CrossRefGoogle Scholar
  26. 26.
  27. 27.
    Ben Salem, Z., Youssef, M. W., Abid, M. (2010). Prototyping cost-effective secure application server on a chip (sasoc) a case study for monitoring sensor network. In International conference on wireless and ubiquitous systems Sousse.Google Scholar
  28. 28.
    Schleuniger, P., McKee, S., Karlsson, S. (2012). Design principles for synthesizable processor cores.CrossRefGoogle Scholar
  29. 29.
    Multanen, J., Kultala, H., Koskela, M., Viitanen, T., Jääskeläinen, P., Takala, J., Danielyan, A., Cruz, C. (2016). Opencl programmable exposed datapath high performance low-power image signal processor. In IEEE Nordic circuits and systems conference.Google Scholar
  30. 30.
    Esko, O., Jääskeläinen, P., Huerta, P., de La Lama, C. S., Takala, J., Martinez, J. I. (2010). Customized exposed datapath soft-core design flow with compiler support. In Proceedings of international conference on field programmable logic and applications. Washington, DC.Google Scholar
  31. 31.
    Siti, M., & Fitz, M. P. (2006). A novel soft-output layered orthogonal lattice detector for multiple antenna communications. In International conference on communications (Vol. 4). Istanbul.Google Scholar
  32. 32.
    Hara, Y., Tomiyama, H., Honda, S., Takada, H. (2009). Proposal and quantitative analysis of the CHStone benchmark program suite for practical C-based high-level synthesis. Journal of Information Processing, 17, 242–254.CrossRefGoogle Scholar
  33. 33.
    Zivojnovic, V., Martinez, J., Schlger, C., Meyr, H. (1994). DSPstone: a DSP-oriented benchmarking methodology. In Proceedings of the international conference on signal processing applications and technology. Dallas.Google Scholar
  34. 34.
    EEMBC –. (2016). The embedded microprocessor benchmark consortium. Coremark benchmark.
  35. 35.
    Wilhelm, R, Engblom, J, Ermedahl, A, Holsti, N, Thesing, S, Whalley, D, Bernat, G, Ferdinand, C, Heckmann, R, Mitra, T, Mueller, F, Puaut, I, Puschner, P, Staschulat, J, Stenström, P. (2008). The worst-case execution-time problem - overview of methods and survey of tools. ACM Transactions on Embedded Computing Systems, 7(3), 1–53.CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Customized Parallel Computing Group, Laboratory of Pervasive ComputingTampere University of TechnologyTampereFinland

Personalised recommendations