Skip to main content

Advertisement

Log in

Instruction Fetch Energy Reduction with Biased SRAMs

  • Published:
Journal of Signal Processing Systems Aims and scope Submit manuscript

Abstract

Especially in programmable processors, energy consumption of integrated memories can become a limiting design factor due to thermal dissipation power constraints and limited battery capacity. Consequently, contemporary improvement efforts on memory technologies are focusing more on the energy-efficiency aspects, which has resulted in biased CMOS SRAM cells that increase energy efficiency by favoring one logical value over another. In this paper, xor-masking, a method for exploiting such contemporary low power SRAM memories is proposed to improve the energy-efficiency of instruction fetching. Xor-masking utilizes static program analysis statistics to produce optimal encoding masks to reduce the occurrence of the more energy consuming instruction bit values in the fetched instructions. The method is evaluated on LatticeMico32, a small RISC core popular in ultra low power designs, and on a wide instruction word high performance low power DSP. Compared to the previous “bus invert” technique typically used with similar SRAMs, the proposed method reduces instruction read energy consumption of the LatticeMico32 by up to 13% and 38% on the DSP core.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9
Figure 10
Figure 11

Similar content being viewed by others

References

  1. Atzori, L, Iera, A, Morabito, G. (2010). The internet of things: a survey. Computer Networks, 54(15), 2787–2805.

    Article  Google Scholar 

  2. Taylor, M. (2012). Is dark silicon useful?: harnessing the four horsemen of the coming dark silicon apocalypse. In Proceedings of the 49th annual design automation conference.

  3. Bol, D., De Vos, J., Hocquet, C., Botman, F., Durvaux, F., Boyd, S., Flandre, D., Legat, J. (2013). SleepWalker: a 25-MHz 0.4-V Sub-mm2 7- μ m 2 μ W/MHz microcontroller in 65-nm LP/GP CMOS for low-carbon wireless sensor nodes. IEEE Journal of Solid-State Circuits, 48(1), 20–32.

    Article  Google Scholar 

  4. Carroll, A., & Heiser, G. (2010). An analysis of power consumption in a smartphone. In Proceedings of the USENIX annual technical conference. Boston.

  5. Fong, X., Kim, Y., Yogendra, K., Fan, D., Sengupta, A., Raghunathan, A., Roy, K. (2016). Spin-transfer torque devices for logic and memory: prospects and perspectives. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 35(1), 1–22.

    Article  Google Scholar 

  6. Hu, J., Xue, C. J., Zhuge, Q., Tseng, W. C., Sha, E. H. -M. (2011). Towards energy efficient hybrid on-chip scratch pad memory with non-volatile memory. In Design, automation test in europe conference exhibition.

  7. Benini, L., Macii, A., Poncino, M. (2003). Energy-aware design of embedded memories: a survey of technologies, architectures, and optimization techniques. Transactions on Embedded Computing Systems, 2(1), 5–32.

    Article  Google Scholar 

  8. ISSCC. (2016). ISSCC 2016 tech trends. http://isscc.org.

  9. Azizi, N., & Najm, F. N. (2004). An asymmetric SRAM cell to lower gate leakage. In Proceedings of the 5th international symposium on quality electronic design. Hangzhou.

  10. Imani, M., Patil, S., Rosing, T. S. (2015). Hierarchical design of robust and low data dependent FinFET based SRAM array. In Proceedings of the international symposium on nanoscale architectures. Boston.

  11. Mori, H., Nakagawa, T., Kitahara, Y., Kawamoto, Y., Takagi, K., Yoshimoto, S., Izumi, S., Nii, K., Kawaguchi, H., Yoshimoto, M. (2015). A 298-fJ/writecycle 650-fJ/readcycle 8T three-port SRAM in 28-nm FD-SOI process technology for image processor. In Proceedings of the IEEE custom integrated circuits conference. San Jose.

  12. Teman, A., Mordakhay, A., Mezhibovsky, J., Fish, A. (2012). A 40-nm sub-threshold 5T SRAM bit cell with improved read and write stability. IEEE Transactions on Circuits and Systems II: Express Briefs, 59(12), 873–877.

    Article  Google Scholar 

  13. Young, K. K. (1989). Short-channel effect in fully depleted soi mosfets. IEEE Transactions on Electron Devices, 36(2), 399–402.

    Article  Google Scholar 

  14. Multanen, J., Viitanen, T., Jääskeläinen, P., Takala, J. (2016). Xor-masking: a novel statistical method for instruction read energy reduction in contemporary SRAM technologies. In International workshop on signal processing systems. Dallas.

  15. Stan, M. R., & Burleson, W. P. (1995). Bus-invert coding for low-power I/O. IEEE Transactions on Very Large Scale Integration Systems, 3(1), 49–58.

    Article  Google Scholar 

  16. Shin, Y., Chae, S. -I., Choi, K. (2001). Partial bus-invert coding for power optimization of application-specific systems. IEEE Transactions on Very Large Scale Integration Systems, 9(2), 377–383.

    Article  Google Scholar 

  17. Ji, G., & Hui, G. (2009). A segmental bus-invert coding method for instruction memory data bus power efficiency. In Proceedings of the IEEE international symposium on circuits and systems. Taipei.

  18. Petrov, P., & Orailoglu, A. (2003). Application-specific instruction memory customizations for power-efficient embedded processors. IEEE Design Test of Computers, 20(1), 18–25.

    Article  Google Scholar 

  19. Su, C., Tsui, C., Despain, A. (1994). Saving power in the control path of embedded processors. IEEE Design and Test of Computers, 11(4), 24–31.

    Article  Google Scholar 

  20. Musoll, E., Lang, T., Cortadella, J. (1998). Working-zone encoding for reducing the energy in microprocessor address buses. IEEE Transactions on Very Large Scale Integration Systems, 6(4).

    Article  Google Scholar 

  21. Benini, L., De Micheli, G., Macii, E., Poncino, M., Quez, S. (1997). System-level power optimization of special purpose applications: the beach solution. In Proceedings of the international symposium on low power electronics and design. Monterey.

  22. Yang, J., Gupta, R., Zhang, C. (2004). Frequent value encoding for low power data buses. ACM Transactions on Design Automation of Electronic Systems, 9(3), 354–384.

    Article  Google Scholar 

  23. Hennessy, J., & Patterson, D. (2002). Computer architecture: a quantitative approach, 3rd edn. San Francisco: Morgan Kaufmann Publishers Inc.,.

    MATH  Google Scholar 

  24. Parhami, B. (1991). Design of m-out-of-n bit-voters. In Conference record of the twenty-fifth asilomar conference on signals, systems and computers (Vol. 2). Pacific Grove.

  25. Suresh, D. C., Najjar, W. A., Vahid, F., Villarreal, J. R., Stitt, G. (2003). Profiling tools for hardware/software partitioning of embedded applications. SIGPLAN Notices, 38(7), 189–198.

    Article  Google Scholar 

  26. Lattice Semiconductor. Latticemico32 (2016). http://www.latticesemi.com/en/Products/DesignSoftwareAndIP/IntellectualProperty/IPCore/IPCores02/LatticeMico32.aspx.

  27. Ben Salem, Z., Youssef, M. W., Abid, M. (2010). Prototyping cost-effective secure application server on a chip (sasoc) a case study for monitoring sensor network. In International conference on wireless and ubiquitous systems Sousse.

  28. Schleuniger, P., McKee, S., Karlsson, S. (2012). Design principles for synthesizable processor cores.

    Chapter  Google Scholar 

  29. Multanen, J., Kultala, H., Koskela, M., Viitanen, T., Jääskeläinen, P., Takala, J., Danielyan, A., Cruz, C. (2016). Opencl programmable exposed datapath high performance low-power image signal processor. In IEEE Nordic circuits and systems conference.

  30. Esko, O., Jääskeläinen, P., Huerta, P., de La Lama, C. S., Takala, J., Martinez, J. I. (2010). Customized exposed datapath soft-core design flow with compiler support. In Proceedings of international conference on field programmable logic and applications. Washington, DC.

  31. Siti, M., & Fitz, M. P. (2006). A novel soft-output layered orthogonal lattice detector for multiple antenna communications. In International conference on communications (Vol. 4). Istanbul.

  32. Hara, Y., Tomiyama, H., Honda, S., Takada, H. (2009). Proposal and quantitative analysis of the CHStone benchmark program suite for practical C-based high-level synthesis. Journal of Information Processing, 17, 242–254.

    Article  Google Scholar 

  33. Zivojnovic, V., Martinez, J., Schlger, C., Meyr, H. (1994). DSPstone: a DSP-oriented benchmarking methodology. In Proceedings of the international conference on signal processing applications and technology. Dallas.

  34. EEMBC –. (2016). The embedded microprocessor benchmark consortium. Coremark benchmark. http://www.eembc.org/coremark.

  35. Wilhelm, R, Engblom, J, Ermedahl, A, Holsti, N, Thesing, S, Whalley, D, Bernat, G, Ferdinand, C, Heckmann, R, Mitra, T, Mueller, F, Puaut, I, Puschner, P, Staschulat, J, Stenström, P. (2008). The worst-case execution-time problem - overview of methods and survey of tools. ACM Transactions on Embedded Computing Systems, 7(3), 1–53.

    Article  Google Scholar 

Download references

Acknowledgments

The authors would like to thank the TUT Graduate School, Academy of Finland (project PLC), Finnish Funding Agency for Technology and Innovation (project ”Parallel Acceleration 3”, funding decision 1134/31/2015), and ARTEMIS JU under grant agreement no 621439 (ALMARVI).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Joonas Multanen.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Multanen, J., Viitanen, T., Jääskeläinen, P. et al. Instruction Fetch Energy Reduction with Biased SRAMs. J Sign Process Syst 90, 1519–1532 (2018). https://doi.org/10.1007/s11265-018-1367-6

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11265-018-1367-6

Keywords

Navigation