Advertisement

A Circuit-Architecture Co-optimization Framework for Exploring Nonvolatile Memory Hierarchies

Chapter

Abstract

Many new memory technologies are available in building future energy-efficient memory hierarchies. It is necessary to have a framework that can quickly find the optimal memory technology on each hierarchy level. In this work, we first build a circuit-architecture joint design space exploration framework by combining RC circuit analysis and ANN-based performance modeling. Then, we use this framework to evaluate some emerging nonvolatile memory hierarchies. We demonstrate that an ReRAM-based cache hierarchy on an 8-core CMP system can achieve a 28 % EDP (Energy-Delay Product) improvement and a 39 % EDAP (Energy-Delay-Area Product) improvement compared to a conventional hierarchy with SRAM on-chip caches and DRAM main memory.

References

  1. 1.
    Udipi, A. N., et al. (2010). Rethinking DRAM design and organization for energy-constrained multi-cores. In Proceedings of the International Symposium on Computer Architecture (pp. 175–186).Google Scholar
  2. 2.
    Meng, Y., et al. (2005). On the limits of leakage power reduction in caches. Architecture: In Proceedings of the International Symposium on High-Performance Computer (pp. 154–165).Google Scholar
  3. 3.
    International technology roadmap for semiconductors. Process integration, devices, and structures 2010 update. (2010). http://www.itrs.net/.
  4. 4.
    Kalla, R., et al. (2010). POWER7: IBM’s next-generation server processor. IEEE Micro, 30(2), 7–15.Google Scholar
  5. 5.
    Lee, B. C., et al. (2009). Architecting phase change memory as a scalable DRAM alternative. Architecture: In Proceedings of the International Symposium on Computer (pp. 2–13).Google Scholar
  6. 6.
    Zhou, P., et al. (2009). A durable and energy efficient main memory using phase change memory technology. Architecture: In Proceedings of the International Symposium on Computer (14–23).Google Scholar
  7. 7.
    Qureshi, M. K., et al. (2009b). Scalable high performance main memory system using phase-change memory technology. Architecture: In Proceedings of the International Symposium on Computer (pp. 24–33).Google Scholar
  8. 8.
    Qureshi, M. K. (2009a). Enhancing lifetime and security of PCM-based main memory with start-gap wear leveling. In Proceedings of the International Symposium on Microarchitecture (pp. 14–23).Google Scholar
  9. 9.
    Seong, N. S., et al. (2010). Security refresh: Prevent malicious wear-out and increase durability for phase-change memory with dynamically randomized address mapping. In Proceedings of the International Symposium on Computer Architecture (pp. 383–394).Google Scholar
  10. 10.
    Schechter, S. (2010). Use ECP, not ECC, for hard failures in resistive memories. Architecture: In Proceedings of the International Symposium on Computer (pp. 141–152).Google Scholar
  11. 11.
    Dong, X., et al. (2008). Circuit and microarchitecture evaluation of 3D stacking magnetic RAM (MRAM) as a universal memory replacement. In Proceedings of the Design Automation Conference (pp. 554–559).Google Scholar
  12. 12.
    Sun, G., et al. (2009). A novel 3D stacked MRAM cache architecture for CMPs. Architecture: In Proceedings of the International Symposium on High-Performance Computer (pp. 239–249).Google Scholar
  13. 13.
    Smullen, C. W., et al. (2011). Relaxing non-volatility for fast and energy-efficient STT-RAM caches. In Proceedings of the International Symposium on High Performance, Computer Architecture (pp. 50–61).Google Scholar
  14. 14.
    Lee, K.-J., et al. (2008). A 90 nm 1.8 V 512 Mb diode-switch PRAM with 266MB/s read throughput. IEEE Journal of Solid-State Circuits, 43(1), 150–162.CrossRefGoogle Scholar
  15. 15.
    Yoshitaka, S., et al. (2009). Cross-point phase change memory with \(4F^2\) cell size driven by low-contact-resistivity poly-Si diode. In Proceedings of the Symposium on VLSI Technology (pp. 24–25).Google Scholar
  16. 16.
    De Sandre, G., et al. (2010). A 90nm 4Mb embedded phase-change memory with 1.2V 12ns read access time and 1MB/s write throughput. In Proceedings of the International Solid-State Circuits Conference (pp. 268–269).Google Scholar
  17. 17.
    Kawahara, T., et al. (2007). 2 Mb spin-transfer torque RAM (SPRAM) with bit-by-bit bidirectional current write and parallelizing-direction current read. In Proceedings of the International Solid-State Circuits Conference (pp. 480–617).Google Scholar
  18. 18.
    Tsuchida, K., et al. (2010). A 64 Mb MRAM with clamped-reference and adequate-reference schemes. In Proceedings of the International Solid-State Circuits Conference (pp. 268–269).Google Scholar
  19. 19.
    Chen, Y.-C., et al. (2003). An access-transistor-free (0T/1R) non-volatile resistance random access memory (RRAM) using a novel threshold switching, self-rectifying chalcogenide device. In Proceedings of the International Electron Devices Meeting (pp. 750–753).Google Scholar
  20. 20.
    Kim, K.-H., et al. (2010). Nanoscale resistive memory with intrinsic diode characteristics and long endurance. Applied Physics Letters, 96(5), 053106.1-053106.3.Google Scholar
  21. 21.
    Sheu, S.-S., et al. (2011). A 4 Mb embedded SLC resistive-RAM macro with 7.2 ns read-write random-access time and 160 ns MLC-access capability. In Proceedings of the IEEE International Solid-State Circuits Conference (pp. 200–201).Google Scholar
  22. 22.
    Wilton, S. J. E., & Jouppi, N. P. (1996). CACTI: An enhanced cache access and cycle time model. IEEE Journal of Solid-State Circuits, 31, 677–688.CrossRefGoogle Scholar
  23. 23.
    Thoziyoor, S., et al. (2008b). CACTI 5.1 technical report. Technical report HPL-2008-20. HP labs.Google Scholar
  24. 24.
    Amrutur, B. S., & Horowitz, M. A. (2000). Speed and power scaling of SRAM’s. IEEE Journal of Solid-State Circuits, 35(2), 175–185.Google Scholar
  25. 25.
    Azizi, O., et al. (2010). Energy-performance tradeoffs in processor architecture and circuit design: A marginal cost analysis. In Proceedings of the International Symposium on Computer Architecture (pp. 26–36).Google Scholar
  26. 26.
    Joseph, P. J., et al. (2006a). Construction and use of linear regression models for processor performance analysis. In Proceedings of the International Symposium on High-Performance Computer Architecture (pp. 99–108).Google Scholar
  27. 27.
    Lee, B. C., & Brooks, D. M. (2006). Accurate and efficient regression modeling for microarchitectural performance and power prediction. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (pp. 185–194).Google Scholar
  28. 28.
    Ipek, E., et al. (2008). Efficient architectural design space exploration via predictive modeling. ACM Transactions on Architecture and Code Optimization, 4(4), 1:1–1:34.Google Scholar
  29. 29.
    Joseph, P. J., et al. (2006b). A predictive performance model for superscalar processors. In Proceedings of the International Symposium on Microarchitecture (pp. 161–170).Google Scholar
  30. 30.
    Dubach, C., et al. (2007). Microarchitectural design space exploration using an architecture-centric approach. In Proceedings of the International Symposium on Microarchitecture (pp. 262–271).Google Scholar
  31. 31.
    Muralimanohar, N. (2008). Architecting efficient interconnects for large caches with CACTI 6.0. IEEE Micro, 28(1), 69–79.CrossRefGoogle Scholar
  32. 32.
    Thoziyoor, S., et al. (2008a). A comprehensive memory modeling tool and its application to the design and analysis of future memory hierarchies. In Proceedings of the International Symposium on Computer Architecture (pp. 51–62).Google Scholar
  33. 33.
    International technology roadmap for semiconductors. The model for assessment of cmoS technologies and roadmaps (MASTAR). (2010). http://www.itrs.net/models.html.
  34. 34.
    Sutherland, I. E., et al. (1999). Logical effort: Designing fast CMOS circuits. Morgan Kaufmann.Google Scholar
  35. 35.
    Zhang, Y., et al. (2007). An integrated phase change memory cell with ge nanowire diode for cross-point memory. In Proceedings of the IEEE Symposium on VLSI, Technology (pp. 98–99).Google Scholar
  36. 36.
    Lee, M.-J., et al. (2007). 2-stack 1D–1R cross-point structure with oxide diodes as switch elements for high density resistance RAM applications. In Proceedings of the IEEE International Electron Devices Meeting (pp. 771–774).Google Scholar
  37. 37.
    Kau, D. C., et al. (2009). A stackable cross point phase change memory. In Proceedings of the IEEE International Electron Devices Meeting, 27.1.1-27.1.4.Google Scholar
  38. 38.
    Xu, C., et al. (2011). Design implications of memristor-based RRAM cross-point structures. In Proceedings of the Design, Automation and Test in, Europe (pp. 1–6).Google Scholar
  39. 39.
    Sarle, W. S. (1995). Stopped training and other remedies for overfitting. In Proceedings of the Symposium on the Interface of Computing Science and, Statistics (pp. 55–69).Google Scholar
  40. 40.
    Marquardt, D. W. (1963). An algorithm for least-squares estimation of nonlinear parameters. Journal of the Society for Industrial and Applied Mathematics, 11(2), 431–441.MathSciNetCrossRefMATHGoogle Scholar
  41. 41.
    NASA advanced supercomputing (NAS) division. The NAS parallel benchmarks (NPB) 3.3. http://www.nas.nasa.gov/Resources/Software/npb.html.
  42. 42.
    Bienia, C., et al. (2008). The PARSEC benchmark suite: characterization and architectural implications. In Proceedings of the International Conference on Parallel architectures and Compilation Techniques (pp. 72–81).Google Scholar
  43. 43.
    Magnusson, P. S., et al. (2002). Simics: A full system simulation platform. Computer, 35(2), 50–58.CrossRefGoogle Scholar
  44. 44.
    Kirkpatrick, S., et al. (1983). Optimization by simulated annealing. Science Magazine, 220(4598), 671–680.Google Scholar
  45. 45.
    Li, S., et al. (2009). McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures. In Proceedings of the International Symposium on Microarchitecture (pp. 469–480).Google Scholar
  46. 46.
    Yang, J. J., et al. (2008). Memristive switching mechanism for metal/oxide/metal nanodevices. Nature Nanotechnology, 3(7), 429–433.CrossRefGoogle Scholar
  47. 47.
    Kim, Y.-B., et al. (2011). Bi-Layered RRAM with unlimited endurance and extremely uniform switching. In Proceedings of the Symposium on VLSI Technology (pp. 52–53).Google Scholar
  48. 48.
    Eshraghian, K., et al. (2010). Memristor MOS content addressable memory (MCAM): Hybrid architecture for future high performance search engines. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 99, 1–11.Google Scholar

Copyright information

© Springer Science+Business Media New York 2014

Authors and Affiliations

  1. 1.Qualcomm Technology, Inc.QualcommUSA
  2. 2.Google Inc.CAUSA
  3. 3.Pennsylvania State University and AMD ResearchPennsylvaniaUSA

Personalised recommendations