Skip to main content

An Energy-Efficient 3D Stacked STT-RAM Cache Architecture for CMPs

  • Chapter
  • First Online:
Emerging Memory Technologies

Abstract

In this chapter, we introduce how to adopt spin-transfer torque random access memory (STT-RAM) as on-chip L2 caches to achieve better performance and lower energy consumption, compared to traditional L2 cache designs. STT-RAM is a promising memory technology for on-chip cache design because of its fast read access, high density, and non-volatility. Using 3D heterogeneous integrations, it becomes feasible and cost-efficient to stack STT-RAM atop conventional chip multiprocessors (CMPs). However, one disadvantage of STT-RAM is its long write latency and its high write energy. In this chapter, we first stack STT-RAM-based L2 caches directly atop CMPs and compare it against SRAM counterparts in terms of performance and energy. We observe that the direct STT-RAM stacking might harm the chip performance due to the aforementioned long write latency and high write energy. To solve this problem, we then propose two architectural techniques: read-preemptive write buffer and SRAM–STT-RAM hybrid L2 cache. The simulation result shows that our optimized STT-RAM L2 cache improves performance by 4.91 % and reduces power by 73.5 % compared to the conventional SRAM L2 cache with the similar area.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    TPKI is the number of total transactions per 1K instructions, and WPKI is the number of write transactions per 1K instructions.

References

  1. Black, B., Annavaram, M., Brekelbaum, N., et al. (2006). Die stacking (3D) microarchitecture. In MICRO 39: Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture (pp. 469–479).

    Google Scholar 

  2. Borkar, S. (2008). 3D technology: A system perspective. In Technical Digest of the International 3D System Integration Conference (pp. 1–14).

    Google Scholar 

  3. Burger, D., Goodman, J. R., & Kagi, A. (1997). Limited bandwidth to affect processor design. Micro, IEEE, 17(6), 55–62.

    Article  Google Scholar 

  4. Chishti, Z., Powell, M. D., & Vijaykumar, T. N. (2005). Optimizing replication, communication, and capacity allocation in CMPs. SIGARCH Computer Architecture News, 33(2), 357–368.

    Article  Google Scholar 

  5. Davis, J. D., Laudon, J., & Olukotun, K. (2005). Maximizing CMP throughput with mediocre cores. In PACT ’05: Proceedings of the 14th International Conference on Parallel Architectures and Compilation, Techniques (pp. 51–62).

    Google Scholar 

  6. Davis, W. R., Wilson, J., Mick, S., et al. (2005). Demystifying 3D ICs: The pros and cons of going vertical. IEEE Design and Test of Computers, 22(6), 498–510.

    Article  Google Scholar 

  7. Desikan, R., Lefurgy, C. R., Keckler, S. W., & Burger, D. (2002). On-chip MRAM as a high-bandwidth low-latency replacement for DRAM physical memories. Technical report.

    Google Scholar 

  8. Diao, Z., Li, Z., Wang, S., et al. (2007). Spin-transfer torque switching in magnetic tunnel junctions and spin-transfer torque random access memory. Journal of Physics: Condensed Matter, 19(16), 165, 209 (13pp).

    Google Scholar 

  9. Dong, X., Wu, X., Sun, G., et al. (2008). Circuit and microarchitecture evaluation of 3D stacking magnetic RAM (MRAM) as a universal memory replacement. In DAC ’08: Proceedings of the 45th annual conference on Design automation (pp. 554–559).

    Google Scholar 

  10. Ghosh, M., & Lee, H. H. S. (2007). Smart refresh: An enhanced memory controller design for reducing energy in conventional and 3D die-stacked DRAMs. In MICRO ’07: Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture (pp. 134–145).

    Google Scholar 

  11. Hosomi, M., Yamagishi, H., Yamamoto, T., et al. (2005). A novel non-volatile memory with spin torque transfer magnetization switching: Spin-RAM. In International Electron Devices Meeting (pp. 459–462).

    Google Scholar 

  12. http://parsec.cs.princeton.edu/

  13. http://www.hpl.hp.com/research/cacti/

  14. http://www.spec.org/

  15. Jacob, P., Erdogan, O., Zia, A., et al. (2005). Predicting the performance of a 3D processor-memory chip stack. IEEE Design and Test of Computers, 22(6), 540–547.

    Article  Google Scholar 

  16. Kahle, J. A., Day, M. N., Hofstee, H. P., et al. (2005). Introduction to the cell multiprocessor. IBM Journal of Research and Development, 49(4/5), 589–604.

    Article  Google Scholar 

  17. Kgil, T., et al., D’Souza, S., Saidi, A., et al. (2006). PicoServer: Using 3D Stacking Technology to Enable a Compact Energy Efficient Chip Multiprocessor. Proceedings of the 2006 ASPLOS Conference, 41(11), 117–128.

    Google Scholar 

  18. Kim, C., Burger, D., & Keckler, S. (2002). An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches. In Proceedings the 10th International Conference on Architectural Support for Programming Languages and Operating Systems.

    Google Scholar 

  19. Kim, J., Chung, S., Jang, T., et al. (2010). Vertical double gate Z-RAM technology with remarkable low voltage operation for DRAM application (pp. 163–164).

    Google Scholar 

  20. Kongetira, P., Aingaran, K., & Olukotun, K. (2005). Niagara: A 32-way multithreaded SPARC processor. IEEE Micro, 25(2), 21–29.

    Article  Google Scholar 

  21. Lee, B. C., Ipek, E., Mutlu, O., & Burger, D. (2009). Architecting phase change memory as a scalable DRAM alternative. In Proceedings of ISCA (pp. 2–13).

    Google Scholar 

  22. Li, F., Nicopoulos, C., Richardson, T., et al. (2006). Design and management of 3D chip multiprocessors using network-in-memory. In ISCA ’06: Proceedings of the 33rd, Annual International Symposium on Computer Architecture (pp. 130–141).

    Google Scholar 

  23. Liu, C. C., Ganusov, I., Burtscher, M., & Tiwari, S. (2005). Bridging the processor-memory performance gap with 3D IC technology. IEEE Design and Test of Computers, 22(6), 556–564.

    Article  Google Scholar 

  24. Loh, G. H. (2008). 3D-stacked memory architectures for multi-core processors. In ISCA ’08: Proceedings of the 35th International Symposium on Computer, Architecture (pp. 453–464).

    Google Scholar 

  25. Loh, G. H., & Hill, M. D. (2011). Efficiently enabling conventional block sizes for very large die-stacked dram caches. In MICRO’11 (pp. 454–464).

    Google Scholar 

  26. Loh, G. H., & Hill, M. D. (2012). Supporting very large dram caches with compound-access scheduling and missmap. IEEE Micro (pp. 70–78).

    Google Scholar 

  27. Loi, G. L., Agrawal, B., Srivastava, N., et al. (2006). A thermally-aware performance analysis of vertically integrated (3-D) processor-memory hierarchy. In DAC ’06: Proceedings of the 43rd Annual Conference on Design automation (pp. 991–996).

    Google Scholar 

  28. Lu, Z., Collaert, N., Aoulaiche, M., De Wachter, B., De Keersgieter, A., Schwarzenbach, W., et al. (2010). A novel low-voltage biasing scheme for double gate fbc achieving 5s retention and \(10_{16}\) endurance at 85c. In IEDM (pp. 12.3.1–12.3.4). doi:10.1109/IEDM.2010.5703347.

    Google Scholar 

  29. Magnusson, P. S., Christensson, M., Eskilson, J., et al. (2002). Simics: A full system simulation platform. Computer, 35(2), 50–58.

    Article  Google Scholar 

  30. Nigam, A., Smullen, C., Mohan, V., Chen, E., Gurumurthi, S., & Stan, M. (2011). Delivering on the promise of universal memory for spin-transfer torque ram (stt-ram). In ISLPED 2011 (pp. 121–126). doi:10.1109/ISLPED.2011.5993623.

    Google Scholar 

  31. Qureshi, M., Franceschini, M., & Lastras-Montano, L. (2010). Improving read performance of phase change memories via write cancellation and write pausing. In HPCA (pp. 1–11). doi:10.1109/HPCA.2010.5416645.

    Google Scholar 

  32. Qureshi, M. K., Srinivasan, V., & Rivers, J. A. (2009). Scalable high performance main memory system using phase-change memory technology. In Proceedings of ISCA (pp. 24–33).

    Google Scholar 

  33. Smullen, C., Mohan, V., Nigam, A., Gurumurthi, S., & Stan, M. (2011). Relaxing non-volatility for fast and energy-efficient stt-ram caches. In 2011 IEEE 17th International Symposium on High Performance Computer Architecture (HPCA) (pp. 50–61). doi:10.1109/HPCA.2011.5749716.

    Google Scholar 

  34. Tsai, Y. F., Xie, Y., Vijaykrishnan, N., & Irwin, M. J. (2005). Three-dimensional cache design exploration using 3DCacti. In ICCD ’05: Proceedings of the 2005 International Conference on, Computer Design (pp. 519–524).

    Google Scholar 

  35. Xie, Y., Loh, G. H., Black, B., & Bernstein, K. (2006). Design space exploration for 3D architectures. ACM Journal on Emerging Technologies in Computing Systems, 2(2), 65–103.

    Article  Google Scholar 

  36. Zhao, W., Belhaire, E., Mistral, Q., et al. (2006). Macro-model of spin-transfer torque based magnetic unnel junction device for hybrid magnetic-CMOS design. In IEEE International Behavioral Modeling and Simulation, Workshop (pp. 40–43).

    Google Scholar 

  37. Zhou, P., Zhao, B., Yang, J., & Zhang, Y. (2009). A durable and energy efficient main memory using phase change memory technology. In Proceedings of ISCA (pp. 14–23).

    Google Scholar 

  38. Zhou, P., Zhao, B., Yang, J., & Zhang, Y. (2009). Energy reduction for stt-ram using early write termination. In ICCAD (pp. 264–268).

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yuan Xie .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer Science+Business Media New York

About this chapter

Cite this chapter

Sun, G., Dong, X., Chen, Y., Xie, Y. (2014). An Energy-Efficient 3D Stacked STT-RAM Cache Architecture for CMPs. In: Xie, Y. (eds) Emerging Memory Technologies. Springer, New York, NY. https://doi.org/10.1007/978-1-4419-9551-3_6

Download citation

  • DOI: https://doi.org/10.1007/978-1-4419-9551-3_6

  • Published:

  • Publisher Name: Springer, New York, NY

  • Print ISBN: 978-1-4419-9550-6

  • Online ISBN: 978-1-4419-9551-3

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics