Abstract
Many new memory technologies are available in building future energy-efficient memory hierarchies. It is necessary to have a framework that can quickly find the optimal memory technology on each hierarchy level. In this work, we first build a circuit-architecture joint design space exploration framework by combining RC circuit analysis and ANN-based performance modeling. Then, we use this framework to evaluate some emerging nonvolatile memory hierarchies. We demonstrate that an ReRAM-based cache hierarchy on an 8-core CMP system can achieve a 28 % EDP (Energy-Delay Product) improvement and a 39 % EDAP (Energy-Delay-Area Product) improvement compared to a conventional hierarchy with SRAM on-chip caches and DRAM main memory.
Extension of Conference Paper: This submission is extended from “A Circuit-Architecture Co-optimization Framework for Evaluating Emerging Memory Hierarchies” published on ISPASS’13. The additional material provided in the submission includes a detailed explanation of the circuit-level and the architecture-level models, a new case study of using PCRAM, and a sensitivity study on processor core counts. This work is supported in part by SRC grants, NSF 1218867, 1213052, 0903432 and by DoE under Award Number DE-SC0005026.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Also called PCM or PRAM.
- 2.
Also called STT-MRAM or MRAM.
- 3.
Also called RRAM, CBRAM, or memristor.
- 4.
Silicon area is the main factor that determines chip cost.
- 5.
We also use the same methodology to collect the data for a 4-core CMP performance model, and the result is shown in Sect. 10.6.4.
- 6.
Another 23 ANN models are built for the 4-core CMP model.
- 7.
There are other models explaining the ReRAM working mechanism.
- 8.
In this work, a neighboring option is generated by changing two parameters from the parameter set of L1 capacity, L1 associativity, L1 memory type, L2 capacity, L2 associativity, L2 memory type, L3 capacity, L3 associativity, and L3 memory type.
References
Udipi, A. N., et al. (2010). Rethinking DRAM design and organization for energy-constrained multi-cores. In Proceedings of the International Symposium on Computer Architecture (pp. 175–186).
Meng, Y., et al. (2005). On the limits of leakage power reduction in caches. Architecture: In Proceedings of the International Symposium on High-Performance Computer (pp. 154–165).
International technology roadmap for semiconductors. Process integration, devices, and structures 2010 update. (2010). http://www.itrs.net/.
Kalla, R., et al. (2010). POWER7: IBM’s next-generation server processor. IEEE Micro, 30(2), 7–15.
Lee, B. C., et al. (2009). Architecting phase change memory as a scalable DRAM alternative. Architecture: In Proceedings of the International Symposium on Computer (pp. 2–13).
Zhou, P., et al. (2009). A durable and energy efficient main memory using phase change memory technology. Architecture: In Proceedings of the International Symposium on Computer (14–23).
Qureshi, M. K., et al. (2009b). Scalable high performance main memory system using phase-change memory technology. Architecture: In Proceedings of the International Symposium on Computer (pp. 24–33).
Qureshi, M. K. (2009a). Enhancing lifetime and security of PCM-based main memory with start-gap wear leveling. In Proceedings of the International Symposium on Microarchitecture (pp. 14–23).
Seong, N. S., et al. (2010). Security refresh: Prevent malicious wear-out and increase durability for phase-change memory with dynamically randomized address mapping. In Proceedings of the International Symposium on Computer Architecture (pp. 383–394).
Schechter, S. (2010). Use ECP, not ECC, for hard failures in resistive memories. Architecture: In Proceedings of the International Symposium on Computer (pp. 141–152).
Dong, X., et al. (2008). Circuit and microarchitecture evaluation of 3D stacking magnetic RAM (MRAM) as a universal memory replacement. In Proceedings of the Design Automation Conference (pp. 554–559).
Sun, G., et al. (2009). A novel 3D stacked MRAM cache architecture for CMPs. Architecture: In Proceedings of the International Symposium on High-Performance Computer (pp. 239–249).
Smullen, C. W., et al. (2011). Relaxing non-volatility for fast and energy-efficient STT-RAM caches. In Proceedings of the International Symposium on High Performance, Computer Architecture (pp. 50–61).
Lee, K.-J., et al. (2008). A 90 nm 1.8 V 512 Mb diode-switch PRAM with 266MB/s read throughput. IEEE Journal of Solid-State Circuits, 43(1), 150–162.
Yoshitaka, S., et al. (2009). Cross-point phase change memory with \(4F^2\) cell size driven by low-contact-resistivity poly-Si diode. In Proceedings of the Symposium on VLSI Technology (pp. 24–25).
De Sandre, G., et al. (2010). A 90nm 4Mb embedded phase-change memory with 1.2V 12ns read access time and 1MB/s write throughput. In Proceedings of the International Solid-State Circuits Conference (pp. 268–269).
Kawahara, T., et al. (2007). 2 Mb spin-transfer torque RAM (SPRAM) with bit-by-bit bidirectional current write and parallelizing-direction current read. In Proceedings of the International Solid-State Circuits Conference (pp. 480–617).
Tsuchida, K., et al. (2010). A 64 Mb MRAM with clamped-reference and adequate-reference schemes. In Proceedings of the International Solid-State Circuits Conference (pp. 268–269).
Chen, Y.-C., et al. (2003). An access-transistor-free (0T/1R) non-volatile resistance random access memory (RRAM) using a novel threshold switching, self-rectifying chalcogenide device. In Proceedings of the International Electron Devices Meeting (pp. 750–753).
Kim, K.-H., et al. (2010). Nanoscale resistive memory with intrinsic diode characteristics and long endurance. Applied Physics Letters, 96(5), 053106.1-053106.3.
Sheu, S.-S., et al. (2011). A 4 Mb embedded SLC resistive-RAM macro with 7.2 ns read-write random-access time and 160 ns MLC-access capability. In Proceedings of the IEEE International Solid-State Circuits Conference (pp. 200–201).
Wilton, S. J. E., & Jouppi, N. P. (1996). CACTI: An enhanced cache access and cycle time model. IEEE Journal of Solid-State Circuits, 31, 677–688.
Thoziyoor, S., et al. (2008b). CACTI 5.1 technical report. Technical report HPL-2008-20. HP labs.
Amrutur, B. S., & Horowitz, M. A. (2000). Speed and power scaling of SRAM’s. IEEE Journal of Solid-State Circuits, 35(2), 175–185.
Azizi, O., et al. (2010). Energy-performance tradeoffs in processor architecture and circuit design: A marginal cost analysis. In Proceedings of the International Symposium on Computer Architecture (pp. 26–36).
Joseph, P. J., et al. (2006a). Construction and use of linear regression models for processor performance analysis. In Proceedings of the International Symposium on High-Performance Computer Architecture (pp. 99–108).
Lee, B. C., & Brooks, D. M. (2006). Accurate and efficient regression modeling for microarchitectural performance and power prediction. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (pp. 185–194).
Ipek, E., et al. (2008). Efficient architectural design space exploration via predictive modeling. ACM Transactions on Architecture and Code Optimization, 4(4), 1:1–1:34.
Joseph, P. J., et al. (2006b). A predictive performance model for superscalar processors. In Proceedings of the International Symposium on Microarchitecture (pp. 161–170).
Dubach, C., et al. (2007). Microarchitectural design space exploration using an architecture-centric approach. In Proceedings of the International Symposium on Microarchitecture (pp. 262–271).
Muralimanohar, N. (2008). Architecting efficient interconnects for large caches with CACTI 6.0. IEEE Micro, 28(1), 69–79.
Thoziyoor, S., et al. (2008a). A comprehensive memory modeling tool and its application to the design and analysis of future memory hierarchies. In Proceedings of the International Symposium on Computer Architecture (pp. 51–62).
International technology roadmap for semiconductors. The model for assessment of cmoS technologies and roadmaps (MASTAR). (2010). http://www.itrs.net/models.html.
Sutherland, I. E., et al. (1999). Logical effort: Designing fast CMOS circuits. Morgan Kaufmann.
Zhang, Y., et al. (2007). An integrated phase change memory cell with ge nanowire diode for cross-point memory. In Proceedings of the IEEE Symposium on VLSI, Technology (pp. 98–99).
Lee, M.-J., et al. (2007). 2-stack 1D–1R cross-point structure with oxide diodes as switch elements for high density resistance RAM applications. In Proceedings of the IEEE International Electron Devices Meeting (pp. 771–774).
Kau, D. C., et al. (2009). A stackable cross point phase change memory. In Proceedings of the IEEE International Electron Devices Meeting, 27.1.1-27.1.4.
Xu, C., et al. (2011). Design implications of memristor-based RRAM cross-point structures. In Proceedings of the Design, Automation and Test in, Europe (pp. 1–6).
Sarle, W. S. (1995). Stopped training and other remedies for overfitting. In Proceedings of the Symposium on the Interface of Computing Science and, Statistics (pp. 55–69).
Marquardt, D. W. (1963). An algorithm for least-squares estimation of nonlinear parameters. Journal of the Society for Industrial and Applied Mathematics, 11(2), 431–441.
NASA advanced supercomputing (NAS) division. The NAS parallel benchmarks (NPB) 3.3. http://www.nas.nasa.gov/Resources/Software/npb.html.
Bienia, C., et al. (2008). The PARSEC benchmark suite: characterization and architectural implications. In Proceedings of the International Conference on Parallel architectures and Compilation Techniques (pp. 72–81).
Magnusson, P. S., et al. (2002). Simics: A full system simulation platform. Computer, 35(2), 50–58.
Kirkpatrick, S., et al. (1983). Optimization by simulated annealing. Science Magazine, 220(4598), 671–680.
Li, S., et al. (2009). McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures. In Proceedings of the International Symposium on Microarchitecture (pp. 469–480).
Yang, J. J., et al. (2008). Memristive switching mechanism for metal/oxide/metal nanodevices. Nature Nanotechnology, 3(7), 429–433.
Kim, Y.-B., et al. (2011). Bi-Layered RRAM with unlimited endurance and extremely uniform switching. In Proceedings of the Symposium on VLSI Technology (pp. 52–53).
Eshraghian, K., et al. (2010). Memristor MOS content addressable memory (MCAM): Hybrid architecture for future high performance search engines. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 99, 1–11.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer Science+Business Media New York
About this chapter
Cite this chapter
Dong, X., Jouppi, N.P., Xie, Y. (2014). A Circuit-Architecture Co-optimization Framework for Exploring Nonvolatile Memory Hierarchies. In: Xie, Y. (eds) Emerging Memory Technologies. Springer, New York, NY. https://doi.org/10.1007/978-1-4419-9551-3_10
Download citation
DOI: https://doi.org/10.1007/978-1-4419-9551-3_10
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4419-9550-6
Online ISBN: 978-1-4419-9551-3
eBook Packages: EngineeringEngineering (R0)