Abstract
In this chapter, we introduce how to adopt spin-transfer torque random access memory (STT-RAM) as on-chip L2 caches to achieve better performance and lower energy consumption, compared to traditional L2 cache designs. STT-RAM is a promising memory technology for on-chip cache design because of its fast read access, high density, and non-volatility. Using 3D heterogeneous integrations, it becomes feasible and cost-efficient to stack STT-RAM atop conventional chip multiprocessors (CMPs). However, one disadvantage of STT-RAM is its long write latency and its high write energy. In this chapter, we first stack STT-RAM-based L2 caches directly atop CMPs and compare it against SRAM counterparts in terms of performance and energy. We observe that the direct STT-RAM stacking might harm the chip performance due to the aforementioned long write latency and high write energy. To solve this problem, we then propose two architectural techniques: read-preemptive write buffer and SRAM–STT-RAM hybrid L2 cache. The simulation result shows that our optimized STT-RAM L2 cache improves performance by 4.91 % and reduces power by 73.5 % compared to the conventional SRAM L2 cache with the similar area.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
TPKI is the number of total transactions per 1K instructions, and WPKI is the number of write transactions per 1K instructions.
References
Black, B., Annavaram, M., Brekelbaum, N., et al. (2006). Die stacking (3D) microarchitecture. In MICRO 39: Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture (pp. 469–479).
Borkar, S. (2008). 3D technology: A system perspective. In Technical Digest of the International 3D System Integration Conference (pp. 1–14).
Burger, D., Goodman, J. R., & Kagi, A. (1997). Limited bandwidth to affect processor design. Micro, IEEE, 17(6), 55–62.
Chishti, Z., Powell, M. D., & Vijaykumar, T. N. (2005). Optimizing replication, communication, and capacity allocation in CMPs. SIGARCH Computer Architecture News, 33(2), 357–368.
Davis, J. D., Laudon, J., & Olukotun, K. (2005). Maximizing CMP throughput with mediocre cores. In PACT ’05: Proceedings of the 14th International Conference on Parallel Architectures and Compilation, Techniques (pp. 51–62).
Davis, W. R., Wilson, J., Mick, S., et al. (2005). Demystifying 3D ICs: The pros and cons of going vertical. IEEE Design and Test of Computers, 22(6), 498–510.
Desikan, R., Lefurgy, C. R., Keckler, S. W., & Burger, D. (2002). On-chip MRAM as a high-bandwidth low-latency replacement for DRAM physical memories. Technical report.
Diao, Z., Li, Z., Wang, S., et al. (2007). Spin-transfer torque switching in magnetic tunnel junctions and spin-transfer torque random access memory. Journal of Physics: Condensed Matter, 19(16), 165, 209 (13pp).
Dong, X., Wu, X., Sun, G., et al. (2008). Circuit and microarchitecture evaluation of 3D stacking magnetic RAM (MRAM) as a universal memory replacement. In DAC ’08: Proceedings of the 45th annual conference on Design automation (pp. 554–559).
Ghosh, M., & Lee, H. H. S. (2007). Smart refresh: An enhanced memory controller design for reducing energy in conventional and 3D die-stacked DRAMs. In MICRO ’07: Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture (pp. 134–145).
Hosomi, M., Yamagishi, H., Yamamoto, T., et al. (2005). A novel non-volatile memory with spin torque transfer magnetization switching: Spin-RAM. In International Electron Devices Meeting (pp. 459–462).
Jacob, P., Erdogan, O., Zia, A., et al. (2005). Predicting the performance of a 3D processor-memory chip stack. IEEE Design and Test of Computers, 22(6), 540–547.
Kahle, J. A., Day, M. N., Hofstee, H. P., et al. (2005). Introduction to the cell multiprocessor. IBM Journal of Research and Development, 49(4/5), 589–604.
Kgil, T., et al., D’Souza, S., Saidi, A., et al. (2006). PicoServer: Using 3D Stacking Technology to Enable a Compact Energy Efficient Chip Multiprocessor. Proceedings of the 2006 ASPLOS Conference, 41(11), 117–128.
Kim, C., Burger, D., & Keckler, S. (2002). An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches. In Proceedings the 10th International Conference on Architectural Support for Programming Languages and Operating Systems.
Kim, J., Chung, S., Jang, T., et al. (2010). Vertical double gate Z-RAM technology with remarkable low voltage operation for DRAM application (pp. 163–164).
Kongetira, P., Aingaran, K., & Olukotun, K. (2005). Niagara: A 32-way multithreaded SPARC processor. IEEE Micro, 25(2), 21–29.
Lee, B. C., Ipek, E., Mutlu, O., & Burger, D. (2009). Architecting phase change memory as a scalable DRAM alternative. In Proceedings of ISCA (pp. 2–13).
Li, F., Nicopoulos, C., Richardson, T., et al. (2006). Design and management of 3D chip multiprocessors using network-in-memory. In ISCA ’06: Proceedings of the 33rd, Annual International Symposium on Computer Architecture (pp. 130–141).
Liu, C. C., Ganusov, I., Burtscher, M., & Tiwari, S. (2005). Bridging the processor-memory performance gap with 3D IC technology. IEEE Design and Test of Computers, 22(6), 556–564.
Loh, G. H. (2008). 3D-stacked memory architectures for multi-core processors. In ISCA ’08: Proceedings of the 35th International Symposium on Computer, Architecture (pp. 453–464).
Loh, G. H., & Hill, M. D. (2011). Efficiently enabling conventional block sizes for very large die-stacked dram caches. In MICRO’11 (pp. 454–464).
Loh, G. H., & Hill, M. D. (2012). Supporting very large dram caches with compound-access scheduling and missmap. IEEE Micro (pp. 70–78).
Loi, G. L., Agrawal, B., Srivastava, N., et al. (2006). A thermally-aware performance analysis of vertically integrated (3-D) processor-memory hierarchy. In DAC ’06: Proceedings of the 43rd Annual Conference on Design automation (pp. 991–996).
Lu, Z., Collaert, N., Aoulaiche, M., De Wachter, B., De Keersgieter, A., Schwarzenbach, W., et al. (2010). A novel low-voltage biasing scheme for double gate fbc achieving 5s retention and \(10_{16}\) endurance at 85c. In IEDM (pp. 12.3.1–12.3.4). doi:10.1109/IEDM.2010.5703347.
Magnusson, P. S., Christensson, M., Eskilson, J., et al. (2002). Simics: A full system simulation platform. Computer, 35(2), 50–58.
Nigam, A., Smullen, C., Mohan, V., Chen, E., Gurumurthi, S., & Stan, M. (2011). Delivering on the promise of universal memory for spin-transfer torque ram (stt-ram). In ISLPED 2011 (pp. 121–126). doi:10.1109/ISLPED.2011.5993623.
Qureshi, M., Franceschini, M., & Lastras-Montano, L. (2010). Improving read performance of phase change memories via write cancellation and write pausing. In HPCA (pp. 1–11). doi:10.1109/HPCA.2010.5416645.
Qureshi, M. K., Srinivasan, V., & Rivers, J. A. (2009). Scalable high performance main memory system using phase-change memory technology. In Proceedings of ISCA (pp. 24–33).
Smullen, C., Mohan, V., Nigam, A., Gurumurthi, S., & Stan, M. (2011). Relaxing non-volatility for fast and energy-efficient stt-ram caches. In 2011 IEEE 17th International Symposium on High Performance Computer Architecture (HPCA) (pp. 50–61). doi:10.1109/HPCA.2011.5749716.
Tsai, Y. F., Xie, Y., Vijaykrishnan, N., & Irwin, M. J. (2005). Three-dimensional cache design exploration using 3DCacti. In ICCD ’05: Proceedings of the 2005 International Conference on, Computer Design (pp. 519–524).
Xie, Y., Loh, G. H., Black, B., & Bernstein, K. (2006). Design space exploration for 3D architectures. ACM Journal on Emerging Technologies in Computing Systems, 2(2), 65–103.
Zhao, W., Belhaire, E., Mistral, Q., et al. (2006). Macro-model of spin-transfer torque based magnetic unnel junction device for hybrid magnetic-CMOS design. In IEEE International Behavioral Modeling and Simulation, Workshop (pp. 40–43).
Zhou, P., Zhao, B., Yang, J., & Zhang, Y. (2009). A durable and energy efficient main memory using phase change memory technology. In Proceedings of ISCA (pp. 14–23).
Zhou, P., Zhao, B., Yang, J., & Zhang, Y. (2009). Energy reduction for stt-ram using early write termination. In ICCAD (pp. 264–268).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer Science+Business Media New York
About this chapter
Cite this chapter
Sun, G., Dong, X., Chen, Y., Xie, Y. (2014). An Energy-Efficient 3D Stacked STT-RAM Cache Architecture for CMPs. In: Xie, Y. (eds) Emerging Memory Technologies. Springer, New York, NY. https://doi.org/10.1007/978-1-4419-9551-3_6
Download citation
DOI: https://doi.org/10.1007/978-1-4419-9551-3_6
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4419-9550-6
Online ISBN: 978-1-4419-9551-3
eBook Packages: EngineeringEngineering (R0)