Improving Energy and Performance with Spintronics Caches in Multicore Systems

Tuohy, William; Ma, Cong; Nandkar, Pushkar; Borse, Nishant; Lilja, David J.

doi:10.1007/978-3-319-14313-2_24

Improving Energy and Performance with Spintronics Caches in Multicore Systems

William Tuohy³⁴,
Cong Ma³⁵,
Pushkar Nandkar³⁵,
Nishant Borse³⁵ &
…
David J. Lilja³⁵

Conference paper

1741 Accesses
1 Citations

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8806))

Abstract

Spintronic memory (STT-MRAM) is an attractive alternative technology to CMOS since it offers higher density and virtually no leakage current. Spintronic memory continues to require higher write energy, however, presenting a challenge to memory hierarchy design when energy consumption is a concern. Various techniques for reducing write energy have been studied in the past for a single processor, typically focusing on the last-level caches while keeping the first level caches in CMOS to avoid the write latency. In this work, use of STT-MRAM for the first level caches of a multicore processor is motivated by showing that the impact on throughput due to increased write latency is offset in many cases by increased cache size due to higher density. The Parsec benchmark suite is run on a modern multicore platform simulator, comparing performance and energy consumption of the spintronic cache system to a CMOS design. A small, fully-associative level-0 cache is then introduced (on the order of 8-64 cache lines), and shown to effectively hide the STT-MRAM write latency. Performance degradation due to write latency is restored or slightly improved, while cache energy consumption is reduced by 30-50% for 12 of the 13 benchmarks.

Download to read the full chapter text

Chapter PDF

References

Ahn, J., Yoo, S., Choi, K.: Dasca: Dead write prediction assisted stt-ram cache architecture. In: 2014 IEEE 20th International Symposium on High Performance Computer Architecture, HPCA 2014 (February 2014)
Google Scholar
Bhadauria, M., Weaver, V.M., McKee, S.A.: Understanding PARSEC performance on contemporary CMPs. In: IEEE International Symposium on Workload Characterization, IISWC 2009, pp. 98–107 (2009)
Google Scholar
Bienia, C.: Benchmarking Modern Multiprocessors. Ph.D. thesis, Princeton University (January 2011)
Google Scholar
Binkert, N., Beckmann, B., Black, G., Reinhardt, S.K., Saidi, A., Basu, A., Hestness, J., Hower, D.R., Krishna, T., Sardashti, S., Sen, R., Sewell, K., Shoaib, M., Vaish, N., Hill, M.D., Wood, D.A.: The gem5 simulator. SIGARCH Comput. Archit. News 39(2), 1–7 (2011), http://doi.acm.org/10.1145/2024716.2024718
Article Google Scholar
Gebhart, M., Hestness, J., Fatehi, E., Gratz, P., Keckler, S.W.: Running parsec 2.1 on m5. Tech. rep., The University of Texas at Austin, Department of Computer Science (October 2009)
Google Scholar
Gill, B.S., Modha, D.S.: Wow: Wise ordering for writes - combining spatial and temporal locality in non-volatile caches. In: Proceedings of the 4th Conference on USENIX Conference on File and Storage Technologies, FAST 2005, vol. 4, p. 10. USENIX Association, Berkeley (2005)
Google Scholar
Hewlett-Packard Development Company, L.: Cacti 6.5 (2009), http://www.hpl.hp.com/research/cacti/
Jadidi, A., Arjomand, M., Sarbazi-Azad, H.: High-endurance and performance-efficient design of hybrid cache architectures through adaptive line replacement. In: ISLPED 2011: Proceedings of the 17th IEEE/ACM International Symposium on Low-Power Electronics and Design. IEEE Press (August 2011)
Google Scholar
Jog, A., Mishra, A.K., Xu, C., Xie, Y., Narayanan, V., Iyer, R.K., Das, C.R.: Cache revive: Architecting volatile STT-RAM caches for enhanced performance in CMPs. In: DAC 2012: Proceedings of the 49th Annual Design Automation Conference, pp. 243–252 (2012)
Google Scholar
Jouppi, N.P.: Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers. ACM SIGARCH Computer Architecture News 18, 364–373 (1990)
Article Google Scholar
Kim, Y., Gupta, S.K., Park, S.P., Panagopoulos, G., Roy, K.: Write-optimized reliable design of STT MRAM. In: ISLPED 2012: Proceedings of the 2012 ACM/IEEE international symposium on Low Power Electronics and Design. ACM Request Permissions (July 2012)
Google Scholar
Kwon, K.W., Choday, S.H., Kim, Y., Roy, K.: AWARE (Asymmetric Write Architecture With REdundant Blocks): A High Write Speed STT-MRAM Cache Architecture. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 22(4), 712–720
Google Scholar
Park, S.P., Gupta, S., Mojumder, N., Raghunathan, A., Roy, K.: Future cache design using STT MRAMs for improved energy efficiency: devices, circuits and architecture. In: DAC 2012: Proceedings of the 49th Annual Design Automation Conference. ACM Request Permissions (June 2012)
Google Scholar
Patil, S., Lilja, D.J.: Using resampling techniques to compute confidence intervals for the harmonic mean of rate-based performance metrics. Computer Architecture Letters 9(1), 1–4 (2010)
Article Google Scholar
Rasquinha, M., Choudhary, D., Chatterjee, S., Mukhopadhyay, S., Yalamanchili, S.: An energy efficient cache design using spin torque transfer (STT) RAM. In: ISLPED 2010: Proceedings of the 16th ACM/IEEE International Symposium on Low Power Electronics and Design. ACM Request Permissions (August 2010)
Google Scholar
Smullen, C.W.I., Mohan, V., Nigam, A., Gurumurthi, S., Stan, M.R.J.: Relaxing Non-Volatility for Fast and Energy-Efficient STT-RAM Caches. In: 2011 IEEE 17th International Symposium on High Performance Computer Architecture (HPCA), pp. 50–61 (2011)
Google Scholar
Sun, Z., Bi, X., Li, H.H., Wong, W.F., Ong, Z.L., Zhu, X., Wu, W.: Multi retention level STT-RAM cache designs with a dynamic refresh scheme. In: MICRO-44 2011: Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture. ACM Request Permissions (December 2011)
Google Scholar
Sun, Z., Li, H., Wu, W.: A dual-mode architecture for fast-switching STT-RAM. In: ISLPED 2012: Proceedings of the 2012 ACM/IEEE International Symposium on Low Power Electronics and Design. ACM Request Permissions (July 2012)
Google Scholar
Tange, O.: Gnu parallel - the command-line power tool. ;Login: The USENIX Magazine 36(1), 42–47 (2011), http://www.gnu.org/s/parallel
Varma, A., Jacobson, Q.: Destage algorithms for disk arrays with non-volatile caches. In: Proceedings of the 22nd Annual International Symposium on Computer Architecture, pp. 83–95 (June 1995)
Google Scholar
Wu, X., Li, J., Zhang, L., Speight, E., Xie, Y.: Power and performance of read-write aware hybrid caches with non-volatile memories. In: Design, Automation Test in Europe Conference Exhibition, DATE 2009, pp. 737–742 (April 2009)
Google Scholar
Wunderlich, R.E., Wenisch, T.F., Falsafi, B., Hoe, J.C.: SMARTS: accelerating microarchitecture simulation via rigorous statistical sampling. In: ISCA 2003: Proceedings of the 30th Annual International Symposium on Computer Architecture. ACM (June 2003)
Google Scholar
Xu, W., Sun, H., Wang, X., Chen, Y., Zhang, T.: Design of last-level on-chip cache using spin-torque transfer ram (stt ram). IEEE Transactions on Very Large Scale Integration (VLSI) Systems 19(3), 483–493 (2011)
Article Google Scholar
Yoda, H., Fujita, S., Shimomura, N., Kitagawa, E., Abe, K., Nomura, K., Noguchi, H., Ito, J.: Progress of STT-MRAM technology and the effect on normally-off computing systems. In: 2012 IEEE International Electron Devices Meeting (IEDM), pp. 11.3.1–11.3.4 (2012)
Google Scholar
Zhao, H., Glass, B., Amiri, P.K., Lyle, A., Zhang, Y., Chen, Y.J., Rowlands, G., Upadhyaya, P., Zeng, Z., Katine, J.A., Langer, J., Galatsis, K., Jiang, H., Wang, K.L., Krivorotov, I.N., Wang, J.P.: Sub-200 ps spin transfer torque switching in in-plane magnetic tunnel junctions with interface perpendicular anisotropy. Journal of Physics D: Applied Physics 45(2), 025001 (2011)
Google Scholar
Zhou, P., Zhao, B., Yang, J., Zhang, Y.: Energy reduction for STT-RAM using early write termination. In: ICCAD 2009: Proceedings of the 2009 International Conference on Computer-Aided Design. ACM Request Permissions (November 2009)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, University of Minnesota - Twin Cities, Minneapolis, MN, 55455, USA
William Tuohy
Department of Electrical and Computer Engineering, University of Minnesota - Twin Cities, Minneapolis, MN, 55455, USA
Cong Ma, Pushkar Nandkar, Nishant Borse & David J. Lilja

Authors

William Tuohy
View author publications
You can also search for this author in PubMed Google Scholar
Cong Ma
View author publications
You can also search for this author in PubMed Google Scholar
Pushkar Nandkar
View author publications
You can also search for this author in PubMed Google Scholar
Nishant Borse
View author publications
You can also search for this author in PubMed Google Scholar
David J. Lilja
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

CRACS/INESC-TEC and FCUP, University of Porto, Rua do Campo Alegre, 1021, 4169-007, Porto, Portugal
Luís Lopes
Vilnius University, 08663, Vilnius, Lithuania
Julius Žilinskas
Inria Rennes - Bretagne Atlantique, 35042, Rennes, France
Alexandru Costan
Inria, Campus Universitaire de Beaulieu, 35042, Rennes, France
Roberto G. Cascella
MTA SZTAKI, Budapest, Hungary
Gabor Kecskemeti
Inria, LaBRI, France
Emmanuel Jeannot
University Magna Graecia of Catanzaro, 88100, Catanzaro, Italy
Mario Cannataro
University of Pisa, Italy
Laura Ricci
Faculty of Computer Science, University of Vienna, Wien, Austria
Siegfried Benkner
Universitat Politècnica de València, Spain
Salvador Petit
ISISLab - Dipartimento di Informatica, Università di Salerno, Italy
Vittorio Scarano
High Performance Computing Center Stuttgart (HLRS), University of Stuttgart, 70550, Stuttgart, Germany
José Gracia
Vienna University of Technology, 1040, Vienna, Austria
Sascha Hunold
Tennessee Tech University and Oak Ridge National Laboratory, 38505, Cookeville, TN, USA
Stephen L. Scott
RWTH Aachen University, Aachen, Germany
Stefan Lankes
Department of Informatics and Mathematics, University of Passau, Germany
Christian Lengauer
Universidad Carlos III de Madrid, 28911, Leganés, Spain
Jesús Carretero
TU München, 85747, Garching bei München, Germany
Jens Breitbart
TU Vienna, 1040, Vienna, Austria
Michael Alexander

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tuohy, W., Ma, C., Nandkar, P., Borse, N., Lilja, D.J. (2014). Improving Energy and Performance with Spintronics Caches in Multicore Systems. In: Lopes, L., et al. Euro-Par 2014: Parallel Processing Workshops. Euro-Par 2014. Lecture Notes in Computer Science, vol 8806. Springer, Cham. https://doi.org/10.1007/978-3-319-14313-2_24

Download citation

DOI: https://doi.org/10.1007/978-3-319-14313-2_24
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-14312-5
Online ISBN: 978-3-319-14313-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics