Skip to main content

A Circuit-Architecture Co-optimization Framework for Exploring Nonvolatile Memory Hierarchies

  • Chapter
  • First Online:
Emerging Memory Technologies

Abstract

Many new memory technologies are available in building future energy-efficient memory hierarchies. It is necessary to have a framework that can quickly find the optimal memory technology on each hierarchy level. In this work, we first build a circuit-architecture joint design space exploration framework by combining RC circuit analysis and ANN-based performance modeling. Then, we use this framework to evaluate some emerging nonvolatile memory hierarchies. We demonstrate that an ReRAM-based cache hierarchy on an 8-core CMP system can achieve a 28 % EDP (Energy-Delay Product) improvement and a 39 % EDAP (Energy-Delay-Area Product) improvement compared to a conventional hierarchy with SRAM on-chip caches and DRAM main memory.

Extension of Conference Paper: This submission is extended from “A Circuit-Architecture Co-optimization Framework for Evaluating Emerging Memory Hierarchies” published on ISPASS’13. The additional material provided in the submission includes a detailed explanation of the circuit-level and the architecture-level models, a new case study of using PCRAM, and a sensitivity study on processor core counts. This work is supported in part by SRC grants, NSF 1218867, 1213052, 0903432 and by DoE under Award Number DE-SC0005026.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Also called PCM or PRAM.

  2. 2.

    Also called STT-MRAM or MRAM.

  3. 3.

    Also called RRAM, CBRAM, or memristor.

  4. 4.

    Silicon area is the main factor that determines chip cost.

  5. 5.

    We also use the same methodology to collect the data for a 4-core CMP performance model, and the result is shown in Sect. 10.6.4.

  6. 6.

    Another 23 ANN models are built for the 4-core CMP model.

  7. 7.

    There are other models explaining the ReRAM working mechanism.

  8. 8.

    In this work, a neighboring option is generated by changing two parameters from the parameter set of L1 capacity, L1 associativity, L1 memory type, L2 capacity, L2 associativity, L2 memory type, L3 capacity, L3 associativity, and L3 memory type.

References

  1. Udipi, A. N., et al. (2010). Rethinking DRAM design and organization for energy-constrained multi-cores. In Proceedings of the International Symposium on Computer Architecture (pp. 175–186).

    Google Scholar 

  2. Meng, Y., et al. (2005). On the limits of leakage power reduction in caches. Architecture: In Proceedings of the International Symposium on High-Performance Computer (pp. 154–165).

    Google Scholar 

  3. International technology roadmap for semiconductors. Process integration, devices, and structures 2010 update. (2010). http://www.itrs.net/.

  4. Kalla, R., et al. (2010). POWER7: IBM’s next-generation server processor. IEEE Micro, 30(2), 7–15.

    Google Scholar 

  5. Lee, B. C., et al. (2009). Architecting phase change memory as a scalable DRAM alternative. Architecture: In Proceedings of the International Symposium on Computer (pp. 2–13).

    Google Scholar 

  6. Zhou, P., et al. (2009). A durable and energy efficient main memory using phase change memory technology. Architecture: In Proceedings of the International Symposium on Computer (14–23).

    Google Scholar 

  7. Qureshi, M. K., et al. (2009b). Scalable high performance main memory system using phase-change memory technology. Architecture: In Proceedings of the International Symposium on Computer (pp. 24–33).

    Google Scholar 

  8. Qureshi, M. K. (2009a). Enhancing lifetime and security of PCM-based main memory with start-gap wear leveling. In Proceedings of the International Symposium on Microarchitecture (pp. 14–23).

    Google Scholar 

  9. Seong, N. S., et al. (2010). Security refresh: Prevent malicious wear-out and increase durability for phase-change memory with dynamically randomized address mapping. In Proceedings of the International Symposium on Computer Architecture (pp. 383–394).

    Google Scholar 

  10. Schechter, S. (2010). Use ECP, not ECC, for hard failures in resistive memories. Architecture: In Proceedings of the International Symposium on Computer (pp. 141–152).

    Google Scholar 

  11. Dong, X., et al. (2008). Circuit and microarchitecture evaluation of 3D stacking magnetic RAM (MRAM) as a universal memory replacement. In Proceedings of the Design Automation Conference (pp. 554–559).

    Google Scholar 

  12. Sun, G., et al. (2009). A novel 3D stacked MRAM cache architecture for CMPs. Architecture: In Proceedings of the International Symposium on High-Performance Computer (pp. 239–249).

    Google Scholar 

  13. Smullen, C. W., et al. (2011). Relaxing non-volatility for fast and energy-efficient STT-RAM caches. In Proceedings of the International Symposium on High Performance, Computer Architecture (pp. 50–61).

    Google Scholar 

  14. Lee, K.-J., et al. (2008). A 90 nm 1.8 V 512 Mb diode-switch PRAM with 266MB/s read throughput. IEEE Journal of Solid-State Circuits, 43(1), 150–162.

    Article  Google Scholar 

  15. Yoshitaka, S., et al. (2009). Cross-point phase change memory with \(4F^2\) cell size driven by low-contact-resistivity poly-Si diode. In Proceedings of the Symposium on VLSI Technology (pp. 24–25).

    Google Scholar 

  16. De Sandre, G., et al. (2010). A 90nm 4Mb embedded phase-change memory with 1.2V 12ns read access time and 1MB/s write throughput. In Proceedings of the International Solid-State Circuits Conference (pp. 268–269).

    Google Scholar 

  17. Kawahara, T., et al. (2007). 2 Mb spin-transfer torque RAM (SPRAM) with bit-by-bit bidirectional current write and parallelizing-direction current read. In Proceedings of the International Solid-State Circuits Conference (pp. 480–617).

    Google Scholar 

  18. Tsuchida, K., et al. (2010). A 64 Mb MRAM with clamped-reference and adequate-reference schemes. In Proceedings of the International Solid-State Circuits Conference (pp. 268–269).

    Google Scholar 

  19. Chen, Y.-C., et al. (2003). An access-transistor-free (0T/1R) non-volatile resistance random access memory (RRAM) using a novel threshold switching, self-rectifying chalcogenide device. In Proceedings of the International Electron Devices Meeting (pp. 750–753).

    Google Scholar 

  20. Kim, K.-H., et al. (2010). Nanoscale resistive memory with intrinsic diode characteristics and long endurance. Applied Physics Letters, 96(5), 053106.1-053106.3.

    Google Scholar 

  21. Sheu, S.-S., et al. (2011). A 4 Mb embedded SLC resistive-RAM macro with 7.2 ns read-write random-access time and 160 ns MLC-access capability. In Proceedings of the IEEE International Solid-State Circuits Conference (pp. 200–201).

    Google Scholar 

  22. Wilton, S. J. E., & Jouppi, N. P. (1996). CACTI: An enhanced cache access and cycle time model. IEEE Journal of Solid-State Circuits, 31, 677–688.

    Article  Google Scholar 

  23. Thoziyoor, S., et al. (2008b). CACTI 5.1 technical report. Technical report HPL-2008-20. HP labs.

    Google Scholar 

  24. Amrutur, B. S., & Horowitz, M. A. (2000). Speed and power scaling of SRAM’s. IEEE Journal of Solid-State Circuits, 35(2), 175–185.

    Google Scholar 

  25. Azizi, O., et al. (2010). Energy-performance tradeoffs in processor architecture and circuit design: A marginal cost analysis. In Proceedings of the International Symposium on Computer Architecture (pp. 26–36).

    Google Scholar 

  26. Joseph, P. J., et al. (2006a). Construction and use of linear regression models for processor performance analysis. In Proceedings of the International Symposium on High-Performance Computer Architecture (pp. 99–108).

    Google Scholar 

  27. Lee, B. C., & Brooks, D. M. (2006). Accurate and efficient regression modeling for microarchitectural performance and power prediction. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (pp. 185–194).

    Google Scholar 

  28. Ipek, E., et al. (2008). Efficient architectural design space exploration via predictive modeling. ACM Transactions on Architecture and Code Optimization, 4(4), 1:1–1:34.

    Google Scholar 

  29. Joseph, P. J., et al. (2006b). A predictive performance model for superscalar processors. In Proceedings of the International Symposium on Microarchitecture (pp. 161–170).

    Google Scholar 

  30. Dubach, C., et al. (2007). Microarchitectural design space exploration using an architecture-centric approach. In Proceedings of the International Symposium on Microarchitecture (pp. 262–271).

    Google Scholar 

  31. Muralimanohar, N. (2008). Architecting efficient interconnects for large caches with CACTI 6.0. IEEE Micro, 28(1), 69–79.

    Article  Google Scholar 

  32. Thoziyoor, S., et al. (2008a). A comprehensive memory modeling tool and its application to the design and analysis of future memory hierarchies. In Proceedings of the International Symposium on Computer Architecture (pp. 51–62).

    Google Scholar 

  33. International technology roadmap for semiconductors. The model for assessment of cmoS technologies and roadmaps (MASTAR). (2010). http://www.itrs.net/models.html.

  34. Sutherland, I. E., et al. (1999). Logical effort: Designing fast CMOS circuits. Morgan Kaufmann.

    Google Scholar 

  35. Zhang, Y., et al. (2007). An integrated phase change memory cell with ge nanowire diode for cross-point memory. In Proceedings of the IEEE Symposium on VLSI, Technology (pp. 98–99).

    Google Scholar 

  36. Lee, M.-J., et al. (2007). 2-stack 1D–1R cross-point structure with oxide diodes as switch elements for high density resistance RAM applications. In Proceedings of the IEEE International Electron Devices Meeting (pp. 771–774).

    Google Scholar 

  37. Kau, D. C., et al. (2009). A stackable cross point phase change memory. In Proceedings of the IEEE International Electron Devices Meeting, 27.1.1-27.1.4.

    Google Scholar 

  38. Xu, C., et al. (2011). Design implications of memristor-based RRAM cross-point structures. In Proceedings of the Design, Automation and Test in, Europe (pp. 1–6).

    Google Scholar 

  39. Sarle, W. S. (1995). Stopped training and other remedies for overfitting. In Proceedings of the Symposium on the Interface of Computing Science and, Statistics (pp. 55–69).

    Google Scholar 

  40. Marquardt, D. W. (1963). An algorithm for least-squares estimation of nonlinear parameters. Journal of the Society for Industrial and Applied Mathematics, 11(2), 431–441.

    Article  MathSciNet  MATH  Google Scholar 

  41. NASA advanced supercomputing (NAS) division. The NAS parallel benchmarks (NPB) 3.3. http://www.nas.nasa.gov/Resources/Software/npb.html.

  42. Bienia, C., et al. (2008). The PARSEC benchmark suite: characterization and architectural implications. In Proceedings of the International Conference on Parallel architectures and Compilation Techniques (pp. 72–81).

    Google Scholar 

  43. Magnusson, P. S., et al. (2002). Simics: A full system simulation platform. Computer, 35(2), 50–58.

    Article  Google Scholar 

  44. Kirkpatrick, S., et al. (1983). Optimization by simulated annealing. Science Magazine, 220(4598), 671–680.

    Google Scholar 

  45. Li, S., et al. (2009). McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures. In Proceedings of the International Symposium on Microarchitecture (pp. 469–480).

    Google Scholar 

  46. Yang, J. J., et al. (2008). Memristive switching mechanism for metal/oxide/metal nanodevices. Nature Nanotechnology, 3(7), 429–433.

    Article  Google Scholar 

  47. Kim, Y.-B., et al. (2011). Bi-Layered RRAM with unlimited endurance and extremely uniform switching. In Proceedings of the Symposium on VLSI Technology (pp. 52–53).

    Google Scholar 

  48. Eshraghian, K., et al. (2010). Memristor MOS content addressable memory (MCAM): Hybrid architecture for future high performance search engines. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 99, 1–11.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yuan Xie .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer Science+Business Media New York

About this chapter

Cite this chapter

Dong, X., Jouppi, N.P., Xie, Y. (2014). A Circuit-Architecture Co-optimization Framework for Exploring Nonvolatile Memory Hierarchies. In: Xie, Y. (eds) Emerging Memory Technologies. Springer, New York, NY. https://doi.org/10.1007/978-1-4419-9551-3_10

Download citation

  • DOI: https://doi.org/10.1007/978-1-4419-9551-3_10

  • Published:

  • Publisher Name: Springer, New York, NY

  • Print ISBN: 978-1-4419-9550-6

  • Online ISBN: 978-1-4419-9551-3

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics