Abstract
Spintronic memory (STT-MRAM) is an attractive alternative technology to CMOS since it offers higher density and virtually no leakage current. Spintronic memory continues to require higher write energy, however, presenting a challenge to memory hierarchy design when energy consumption is a concern. Various techniques for reducing write energy have been studied in the past for a single processor, typically focusing on the last-level caches while keeping the first level caches in CMOS to avoid the write latency. In this work, use of STT-MRAM for the first level caches of a multicore processor is motivated by showing that the impact on throughput due to increased write latency is offset in many cases by increased cache size due to higher density. The Parsec benchmark suite is run on a modern multicore platform simulator, comparing performance and energy consumption of the spintronic cache system to a CMOS design. A small, fully-associative level-0 cache is then introduced (on the order of 8-64 cache lines), and shown to effectively hide the STT-MRAM write latency. Performance degradation due to write latency is restored or slightly improved, while cache energy consumption is reduced by 30-50% for 12 of the 13 benchmarks.
Chapter PDF
References
Ahn, J., Yoo, S., Choi, K.: Dasca: Dead write prediction assisted stt-ram cache architecture. In: 2014 IEEE 20th International Symposium on High Performance Computer Architecture, HPCA 2014 (February 2014)
Bhadauria, M., Weaver, V.M., McKee, S.A.: Understanding PARSEC performance on contemporary CMPs. In: IEEE International Symposium on Workload Characterization, IISWC 2009, pp. 98–107 (2009)
Bienia, C.: Benchmarking Modern Multiprocessors. Ph.D. thesis, Princeton University (January 2011)
Binkert, N., Beckmann, B., Black, G., Reinhardt, S.K., Saidi, A., Basu, A., Hestness, J., Hower, D.R., Krishna, T., Sardashti, S., Sen, R., Sewell, K., Shoaib, M., Vaish, N., Hill, M.D., Wood, D.A.: The gem5 simulator. SIGARCH Comput. Archit. News 39(2), 1–7 (2011), http://doi.acm.org/10.1145/2024716.2024718
Gebhart, M., Hestness, J., Fatehi, E., Gratz, P., Keckler, S.W.: Running parsec 2.1 on m5. Tech. rep., The University of Texas at Austin, Department of Computer Science (October 2009)
Gill, B.S., Modha, D.S.: Wow: Wise ordering for writes - combining spatial and temporal locality in non-volatile caches. In: Proceedings of the 4th Conference on USENIX Conference on File and Storage Technologies, FAST 2005, vol. 4, p. 10. USENIX Association, Berkeley (2005)
Hewlett-Packard Development Company, L.: Cacti 6.5 (2009), http://www.hpl.hp.com/research/cacti/
Jadidi, A., Arjomand, M., Sarbazi-Azad, H.: High-endurance and performance-efficient design of hybrid cache architectures through adaptive line replacement. In: ISLPED 2011: Proceedings of the 17th IEEE/ACM International Symposium on Low-Power Electronics and Design. IEEE Press (August 2011)
Jog, A., Mishra, A.K., Xu, C., Xie, Y., Narayanan, V., Iyer, R.K., Das, C.R.: Cache revive: Architecting volatile STT-RAM caches for enhanced performance in CMPs. In: DAC 2012: Proceedings of the 49th Annual Design Automation Conference, pp. 243–252 (2012)
Jouppi, N.P.: Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers. ACM SIGARCH Computer Architecture News 18, 364–373 (1990)
Kim, Y., Gupta, S.K., Park, S.P., Panagopoulos, G., Roy, K.: Write-optimized reliable design of STT MRAM. In: ISLPED 2012: Proceedings of the 2012 ACM/IEEE international symposium on Low Power Electronics and Design. ACM Request Permissions (July 2012)
Kwon, K.W., Choday, S.H., Kim, Y., Roy, K.: AWARE (Asymmetric Write Architecture With REdundant Blocks): A High Write Speed STT-MRAM Cache Architecture. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 22(4), 712–720
Park, S.P., Gupta, S., Mojumder, N., Raghunathan, A., Roy, K.: Future cache design using STT MRAMs for improved energy efficiency: devices, circuits and architecture. In: DAC 2012: Proceedings of the 49th Annual Design Automation Conference. ACM Request Permissions (June 2012)
Patil, S., Lilja, D.J.: Using resampling techniques to compute confidence intervals for the harmonic mean of rate-based performance metrics. Computer Architecture Letters 9(1), 1–4 (2010)
Rasquinha, M., Choudhary, D., Chatterjee, S., Mukhopadhyay, S., Yalamanchili, S.: An energy efficient cache design using spin torque transfer (STT) RAM. In: ISLPED 2010: Proceedings of the 16th ACM/IEEE International Symposium on Low Power Electronics and Design. ACM Request Permissions (August 2010)
Smullen, C.W.I., Mohan, V., Nigam, A., Gurumurthi, S., Stan, M.R.J.: Relaxing Non-Volatility for Fast and Energy-Efficient STT-RAM Caches. In: 2011 IEEE 17th International Symposium on High Performance Computer Architecture (HPCA), pp. 50–61 (2011)
Sun, Z., Bi, X., Li, H.H., Wong, W.F., Ong, Z.L., Zhu, X., Wu, W.: Multi retention level STT-RAM cache designs with a dynamic refresh scheme. In: MICRO-44 2011: Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture. ACM Request Permissions (December 2011)
Sun, Z., Li, H., Wu, W.: A dual-mode architecture for fast-switching STT-RAM. In: ISLPED 2012: Proceedings of the 2012 ACM/IEEE International Symposium on Low Power Electronics and Design. ACM Request Permissions (July 2012)
Tange, O.: Gnu parallel - the command-line power tool. ;Login: The USENIX Magazine 36(1), 42–47 (2011), http://www.gnu.org/s/parallel
Varma, A., Jacobson, Q.: Destage algorithms for disk arrays with non-volatile caches. In: Proceedings of the 22nd Annual International Symposium on Computer Architecture, pp. 83–95 (June 1995)
Wu, X., Li, J., Zhang, L., Speight, E., Xie, Y.: Power and performance of read-write aware hybrid caches with non-volatile memories. In: Design, Automation Test in Europe Conference Exhibition, DATE 2009, pp. 737–742 (April 2009)
Wunderlich, R.E., Wenisch, T.F., Falsafi, B., Hoe, J.C.: SMARTS: accelerating microarchitecture simulation via rigorous statistical sampling. In: ISCA 2003: Proceedings of the 30th Annual International Symposium on Computer Architecture. ACM (June 2003)
Xu, W., Sun, H., Wang, X., Chen, Y., Zhang, T.: Design of last-level on-chip cache using spin-torque transfer ram (stt ram). IEEE Transactions on Very Large Scale Integration (VLSI) Systems 19(3), 483–493 (2011)
Yoda, H., Fujita, S., Shimomura, N., Kitagawa, E., Abe, K., Nomura, K., Noguchi, H., Ito, J.: Progress of STT-MRAM technology and the effect on normally-off computing systems. In: 2012 IEEE International Electron Devices Meeting (IEDM), pp. 11.3.1–11.3.4 (2012)
Zhao, H., Glass, B., Amiri, P.K., Lyle, A., Zhang, Y., Chen, Y.J., Rowlands, G., Upadhyaya, P., Zeng, Z., Katine, J.A., Langer, J., Galatsis, K., Jiang, H., Wang, K.L., Krivorotov, I.N., Wang, J.P.: Sub-200 ps spin transfer torque switching in in-plane magnetic tunnel junctions with interface perpendicular anisotropy. Journal of Physics D: Applied Physics 45(2), 025001 (2011)
Zhou, P., Zhao, B., Yang, J., Zhang, Y.: Energy reduction for STT-RAM using early write termination. In: ICCAD 2009: Proceedings of the 2009 International Conference on Computer-Aided Design. ACM Request Permissions (November 2009)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Tuohy, W., Ma, C., Nandkar, P., Borse, N., Lilja, D.J. (2014). Improving Energy and Performance with Spintronics Caches in Multicore Systems. In: Lopes, L., et al. Euro-Par 2014: Parallel Processing Workshops. Euro-Par 2014. Lecture Notes in Computer Science, vol 8806. Springer, Cham. https://doi.org/10.1007/978-3-319-14313-2_24
Download citation
DOI: https://doi.org/10.1007/978-3-319-14313-2_24
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-14312-5
Online ISBN: 978-3-319-14313-2
eBook Packages: Computer ScienceComputer Science (R0)