Skip to main content
Log in

Sleepy-LRU: extending the lifetime of non-volatile caches by reducing activity of age bits

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Emerging non-volatile memories (NVMs) are known as promising alternatives to SRAMs in on-chip caches. However, their limited write endurance is a major challenge when NVMs are employed in these highly frequently written caches. Early wear-out of NVM cells makes the lifetime of the caches extremely insufficient for nowadays computational systems. Previous studies only addressed the lifetime of data part in the cache. This paper first demonstrates that the age bits field of the cache replacement algorithm is the most frequently written part of a cache block and its lifetime is shorter than that of data part by more than 27\(\times\). Second, it investigates the effect of age bits wear-out on the cache operation and shows that the performance is severely degraded after even a small portion of age bits become non-operational. Third, a novel cache replacement algorithm, so-called Sleepy-LRU, is proposed to reduce the write activity of the age bits with negligible overheads. The evaluations show that Sleepy-LRU extends the lifetime of instruction and data caches to 3.63\(\times\) and 3.00\(\times\), respectively, with an average of 0.06% performance overhead. In addition, Sleepy-LRU imposes no area and power consumption overhead.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19

Similar content being viewed by others

References

  1. Ahn J (2013) Selectively protecting error-correcting code for area-efficient and reliable STT-RAM caches. In: Proceedings of the Asia and South Pacific Design Automation Conference (ASP-DAC), pp 285–290

  2. Ahn J, Yoo S, Choi K (2012) Lower-bits cache for low power STT-RAM caches. In: Proceedings of the International Symposium on Circuits and Systems (ISCAS), pp 480–483

  3. Ahn J, Yoo S, Choi K (2016) Prediction hybrid cache: an energy-efficient STT-RAM cache architecture. IEEE Trans Comput (TC) 65(3):940–951

    Article  MathSciNet  MATH  Google Scholar 

  4. Asadi S, Hosseini Monazzah AM, Farbeh H, Miremadi SG (2017) WIPE: wearout informed pattern elimination to improve the endurance of NVM-based caches. In: Proceedings of the Asia and South Pacific Design Automation Conference (ASP-DAC), pp 10–15

  5. Bienia C, Kumar S, Singh JP, Li K (2008) The parsec benchmark suite: Characterization and architectural implications. In: Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques. ACM, New York, pp 72–81

  6. Binkert N, Beckmann B, Black G, Reinhardt SK, Saidi A, Basu A, Hestness J, Hower DR, Krishna T, Sardashti S, Sen R, Sewell K, Shoaib M, Vaish N, Hill MD, Wood DA (2011) The gem5 simulator. ACM SIGARCH Comput Archit News 39(2):1–7

    Article  Google Scholar 

  7. Chang YM, Hsiu PC, Chang YH, Chen CH, Kuo TW, Wang CYM (2016) Improving PCM endurance with a constant-cost wear leveling design. ACM Trans Des Autom Electron Syst (TODAES) 22(1):9:1–9:27

    Google Scholar 

  8. Chen X, Khoshavi N, DeMara RF, Wang J, Huang D, Wen W, Chen Y (2017) Energy-aware adaptive restore schemes for MLC STT-RAM cache. IEEE Trans Comput (TC) 65(3):786–798

    Article  MathSciNet  Google Scholar 

  9. Cheng HY, Poremba M, Shahidi N, Stalev I, Irwin MJ, Kandemir M, Sampson J, Xie Y (2015) Eecache: a comprehensive study on the architectural design for energy-efficient last-level caches in chip multiprocessors. ACM Trans Archit Code Optim (TACO) 12(2):17:1–17:22

    Google Scholar 

  10. Cheshmikhani E, Farbeh H, Miremadi SG, Asadi H (2018) TA-LRW: a replacement policy for error rate reduction in STT-MRAM caches. IEEE Trans Comput. https://doi.org/10.1109/TC.2018.2875439

    MATH  Google Scholar 

  11. Cheshmikhani E, Hosseini Monazzah AM, Farbeh H, Miremadi SG (2016) Investigating the effects of process variations and system workloads on reliability of STT-RAM caches. In: Proceedings of the European Dependable Computing Conference (EDCC), pp 120–129

  12. Cho S, Lee H (2009) Flip-N-write: a simple deterministic technique to improve PRAM write performance, energy and endurance. In: Proceedings of the International Symposium on Microarchitecture (MICRO), pp 347–357

  13. Dhiman G, Ayoub RZ, Rosing T (2009) PDRAM: a hybrid PRAM and DRAM main memory system. In: Proceedings of the Design Automation Conference (DAC), pp 664–469

  14. Duan G, Wang S (2014) Exploiting narrow-width values for improving non-volatile cache lifetime. In: Proceedings of the Conference on Design, Automation and Test in Europe (DATE), pp 52:1–52:4

  15. Farbeh H, Hyeonggyu K, Miremadi SG, Kim S (2016) Floating-ECC: dynamic repositioning of error correcting code bits for extending the lifetime of STT-RAM caches. IEEE Trans Comput (TC) 65(12):3661–3675

    Article  MathSciNet  MATH  Google Scholar 

  16. Farbeh H, Miremadi SG (2014) PSP-cache: alow-cost fault-tolerant cache memory architecture. In: Proceedings of the Conference on Design, Automation and Test in Europe. European Design and Automation Association, p 164

  17. Farbeh H, Monazzah AMH, Aliagha E, Cheshmikhani E (2018) A-cache: alternating cache allocation to conduct higher endurance in NVM-based caches. IEEE Trans Circuits Syst II Express Briefs PP(99):1–5

    Google Scholar 

  18. Farbeh H, Mozafari F, Zabihi M, Miremadi SG (2017) Raw-tag: replicating in altered cache ways for correcting multiple-bit errors in tag array. IEEE Trans Depend Secure Comput. https://doi.org/10.1109/TDSC.2017.2706263

    Google Scholar 

  19. Fernandez-Pascual R, Ros A, Acacio ME (2017) To be silent or not: on the impact of evictions of clean data in cache-coherent multicores. J Supercomput 73(10):4428–4443

    Article  Google Scholar 

  20. Ghaemi SG, Hosseini Monazzah AM, Farbeh H, Miremadi SG (2015) LATED: lifetime-aware tag for enduring design. In: Proceedings of the IEEE International European Dependable Computing Conference (EDCC), pp 97–107

  21. Henning JL (2006) SPEC CPU2006 benchmark descriptions. ACM SIGARCH Comput Archit News 34(4):1–17

    Article  Google Scholar 

  22. Hijaz F, Shi Q, Kurian G, Devadas S, Khan O (2016) Locality-aware data replication in the last-level cache for large scale multicores. J Supercomput 72(2):718–752

    Article  Google Scholar 

  23. Hong S, Lee J, Kim S (2014) Ternary cache: three-valued MLC STT-RAM caches. In: Proceedings of the IEEE International Conference on Computer Design (ICCD), pp 83–89

  24. Jadidi A, Arjomand M, Sarbazi-Azad H (2011) High-endurance and performance-efficient design of hybrid cache architectures through adaptive line replacement. In: Proceedings of the International Symposium on Low-Power Electronics and Design (ISLPED), pp 79–84

  25. Jaleel A, Theobald KB, Steely Jr SC, Emer J (2010) High performance cache replacement using re-reference interval prediction (RRIP). In: ACM SIGARCH Computer Architecture News, vol 38. ACM, pp 60–71

  26. Jokar MR, Arjomand M, Sarbazi-Azad H (2016) Sequoia: a high-endurance NVM-based cache architecture. IEEE Trans Very Large Scale Integr (VLSI) Syst 24(3):954–967

    Article  Google Scholar 

  27. Joo Y, Niu D, Dong X, Sun G, Chang N, Xie Y (2010) Energy- and endurance-aware design of phase change memory caches. In: Proceedings of the Conference on Design, Automation and Test in Europe (DATE), pp 136–141

  28. Joo Y, Park S (2013) A hybrid PRAM and SST-RAM cache architecture for extending lifetime of PRAM caches. IEEE Comput Archit Lett (CAL) 12(2):55–58

    Article  Google Scholar 

  29. Kang SH (2010) Embedded STT-MRAM for mobile applications: enabling advanced chip architectures. In: Non-Valotile Memories Workshop, UCSD

  30. Lin C, Chiou JN (2015) High-endurance hybrid cache design in CMP architecture with cache partitioning and access-aware policies. IEEE Trans Very Large Scale Integr (VLSI) Syst 23(10):2149–2161

    Article  Google Scholar 

  31. Lin IC, Chiou JN (2015) High-endurance hybrid cache design in CMP architecture with cache partitioning and access-aware policies. IEEE Trans Very Large Scale Integr Syst (TVLSI) 23(10):2149–2161

    Article  Google Scholar 

  32. Mittal S, Vetter JS (2015) AYUSH: a technique for extending lifetime of SRAM–NVM hybrid caches. IEEE Comput Archit Lett (CAL) 14(2):115–118

    Article  Google Scholar 

  33. Mittal S, Vetter JS, Li D (2014) LastingNVCache: a technique for improving the lifetime of non-volatile caches. In: Proceedings of the International Symposium on VLSI (ISVLSI), pp 534–540

  34. Monazzah AMH, Farbeh H, Miremadi SG (2017) Investigating the effects of process variations and system workloads on endurance of non-volatile caches. In: 2017 IEEE International Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems (DFT). IEEE, pp 1–6

  35. Monazzah AMH, Farbeh H, Miremadi SG (2017) Optimas: overwrite purging through in-execution memory address snooping to improve lifetime of NVM-based scratchpad memories. IEEE Trans Device Mater Reliab 17(3):481–489

    Article  Google Scholar 

  36. Qureshi MK, Jaleel A, Patt YN, Steely SC, Emer J (2007) Adaptive insertion policies for high performance caching. In: ACM SIGARCH Computer Architecture News, vol 35. ACM, pp 381–391

  37. Ramtake D, Kumarl S (2018) Performance analysis of first level cache memory replacement policies in multicore systems. Int J Eng Res Comput Sci Eng 5:505–511

    Google Scholar 

  38. Sun Z, Bi X, Wu W, Yoo S, Li HH (2016) Array organization and data management exploration in racetrack memory. IEEE Trans Comput (TC) 65(4):1041–1054

    Article  MathSciNet  MATH  Google Scholar 

  39. Sundriyal V, Sosonkina M (2016) Joint frequency scaling of processor and DRAM. J Supercomput 72(4):1549–1569

    Article  Google Scholar 

  40. UltraSPARC T (2006) Supplement to the ultrasparc architecture 2007

  41. Wang J, Dong X, Xie Y (2014) Preventing STT-RAM last-level caches from port obstruction. ACM Trans Archit Code Optim (TACO) 11(3):23:1–23:19

    Google Scholar 

  42. Wang J, Dong X, Xie Y, Jouppi NP (2013) i$^2$WAP: improving non-volatile cache lifetime by reducing inter- and intra-set write variations. In: Proceedings of the International Symposium on High Performance Computer Architecture (HPCA), pp 234–245

  43. Wang J, Dong X, Xie Y, Jouppi NP (2014) Endurance-aware cache line management for non-volatile caches. ACM Trans Archit Code Optim (TACO) 11(1):4:1–4:25

    Google Scholar 

  44. Wang S, Duan G, Li Y, Dong Q (2017) Word- and partition-level write variation reduction for improving non-volatile cache lifetime. ACM Trans Des Autom Electron Syst (TODAES) 23(1):4:1–4:18

    Google Scholar 

  45. Wen W, Zhang Y, Chen Y, Wang Y, Xie Y (2012) PS3-RAM: a fast portable and scalable statistical STT-RAM reliability analysis method. In: Proceedings of the Design Automation Conference (DAC), pp 1191–1196

  46. Wu CJ, Jaleel A, Hasenplaugh W, Martonosi M, Steely Jr SC, Emer J (2011) Ship: signature-based hit predictor for high performance caching. In: Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture. ACM, New York, pp 430–441

  47. Yan K, Peng L, Chen M, Fu X (2017) Exploring energy-efficient cache design in emerging mobile platforms. ACM Trans Des Autom Electron Syst (TODAES) 22(4):58:1–58:20

    Google Scholar 

  48. Yazdanshenas S, Ranjbar Pirbast M, Fazeli M, Patooghy A (2013) Coding last level STT-RAM cache for high endurance and low power. IEEE Comput Archit Lett (CAL) 13(2):73–76

    Article  Google Scholar 

  49. Young V, Chen C, Jaleel A, Qureshi M (2017) Ship++: enhancing signature-based hit predictor for improved cache performance. In: Proceedings of the Cache Replacement Championship (CRC17) Held in Conjunction with the International Symposium on Computer Architecture (ISCA17)

  50. Zhou P, Zhao B, Yang J, Zhang Y (2009) Energy reduction for STT-RAM using early write termination. In: Proceedings of the International Conference on Computer-Aided Design (ICCAD), pp 264–268

Download references

Funding

Funding was provided by Iran’s National Elites Foundation.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hamed Farbeh.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ghaemi, S.G., Ahmadpour, I., Ardebili, M. et al. Sleepy-LRU: extending the lifetime of non-volatile caches by reducing activity of age bits. J Supercomput 75, 3945–3974 (2019). https://doi.org/10.1007/s11227-019-02758-0

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-019-02758-0

Keywords

Navigation