Advertisement

Sleepy-LRU: extending the lifetime of non-volatile caches by reducing activity of age bits

  • Seyedeh Golsana Ghaemi
  • Iman Ahmadpour
  • Mehdi Ardebili
  • Hamed FarbehEmail author
Article
  • 13 Downloads

Abstract

Emerging non-volatile memories (NVMs) are known as promising alternatives to SRAMs in on-chip caches. However, their limited write endurance is a major challenge when NVMs are employed in these highly frequently written caches. Early wear-out of NVM cells makes the lifetime of the caches extremely insufficient for nowadays computational systems. Previous studies only addressed the lifetime of data part in the cache. This paper first demonstrates that the age bits field of the cache replacement algorithm is the most frequently written part of a cache block and its lifetime is shorter than that of data part by more than 27\(\times\). Second, it investigates the effect of age bits wear-out on the cache operation and shows that the performance is severely degraded after even a small portion of age bits become non-operational. Third, a novel cache replacement algorithm, so-called Sleepy-LRU, is proposed to reduce the write activity of the age bits with negligible overheads. The evaluations show that Sleepy-LRU extends the lifetime of instruction and data caches to 3.63\(\times\) and 3.00\(\times\), respectively, with an average of 0.06% performance overhead. In addition, Sleepy-LRU imposes no area and power consumption overhead.

Keywords

Age bits Lifetime Non-volatile caches Replacement algorithm Write endurance 

Notes

Funding

Funding was provided by Iran’s National Elites Foundation.

References

  1. 1.
    Ahn J (2013) Selectively protecting error-correcting code for area-efficient and reliable STT-RAM caches. In: Proceedings of the Asia and South Pacific Design Automation Conference (ASP-DAC), pp 285–290Google Scholar
  2. 2.
    Ahn J, Yoo S, Choi K (2012) Lower-bits cache for low power STT-RAM caches. In: Proceedings of the International Symposium on Circuits and Systems (ISCAS), pp 480–483Google Scholar
  3. 3.
    Ahn J, Yoo S, Choi K (2016) Prediction hybrid cache: an energy-efficient STT-RAM cache architecture. IEEE Trans Comput (TC) 65(3):940–951MathSciNetCrossRefzbMATHGoogle Scholar
  4. 4.
    Asadi S, Hosseini Monazzah AM, Farbeh H, Miremadi SG (2017) WIPE: wearout informed pattern elimination to improve the endurance of NVM-based caches. In: Proceedings of the Asia and South Pacific Design Automation Conference (ASP-DAC), pp 10–15Google Scholar
  5. 5.
    Bienia C, Kumar S, Singh JP, Li K (2008) The parsec benchmark suite: Characterization and architectural implications. In: Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques. ACM, New York, pp 72–81Google Scholar
  6. 6.
    Binkert N, Beckmann B, Black G, Reinhardt SK, Saidi A, Basu A, Hestness J, Hower DR, Krishna T, Sardashti S, Sen R, Sewell K, Shoaib M, Vaish N, Hill MD, Wood DA (2011) The gem5 simulator. ACM SIGARCH Comput Archit News 39(2):1–7CrossRefGoogle Scholar
  7. 7.
    Chang YM, Hsiu PC, Chang YH, Chen CH, Kuo TW, Wang CYM (2016) Improving PCM endurance with a constant-cost wear leveling design. ACM Trans Des Autom Electron Syst (TODAES) 22(1):9:1–9:27Google Scholar
  8. 8.
    Chen X, Khoshavi N, DeMara RF, Wang J, Huang D, Wen W, Chen Y (2017) Energy-aware adaptive restore schemes for MLC STT-RAM cache. IEEE Trans Comput (TC) 65(3):786–798MathSciNetCrossRefGoogle Scholar
  9. 9.
    Cheng HY, Poremba M, Shahidi N, Stalev I, Irwin MJ, Kandemir M, Sampson J, Xie Y (2015) Eecache: a comprehensive study on the architectural design for energy-efficient last-level caches in chip multiprocessors. ACM Trans Archit Code Optim (TACO) 12(2):17:1–17:22Google Scholar
  10. 10.
    Cheshmikhani E, Farbeh H, Miremadi SG, Asadi H (2018) TA-LRW: a replacement policy for error rate reduction in STT-MRAM caches. IEEE Trans Comput.  https://doi.org/10.1109/TC.2018.2875439 Google Scholar
  11. 11.
    Cheshmikhani E, Hosseini Monazzah AM, Farbeh H, Miremadi SG (2016) Investigating the effects of process variations and system workloads on reliability of STT-RAM caches. In: Proceedings of the European Dependable Computing Conference (EDCC), pp 120–129Google Scholar
  12. 12.
    Cho S, Lee H (2009) Flip-N-write: a simple deterministic technique to improve PRAM write performance, energy and endurance. In: Proceedings of the International Symposium on Microarchitecture (MICRO), pp 347–357Google Scholar
  13. 13.
    Dhiman G, Ayoub RZ, Rosing T (2009) PDRAM: a hybrid PRAM and DRAM main memory system. In: Proceedings of the Design Automation Conference (DAC), pp 664–469Google Scholar
  14. 14.
    Duan G, Wang S (2014) Exploiting narrow-width values for improving non-volatile cache lifetime. In: Proceedings of the Conference on Design, Automation and Test in Europe (DATE), pp 52:1–52:4Google Scholar
  15. 15.
    Farbeh H, Hyeonggyu K, Miremadi SG, Kim S (2016) Floating-ECC: dynamic repositioning of error correcting code bits for extending the lifetime of STT-RAM caches. IEEE Trans Comput (TC) 65(12):3661–3675MathSciNetCrossRefzbMATHGoogle Scholar
  16. 16.
    Farbeh H, Miremadi SG (2014) PSP-cache: alow-cost fault-tolerant cache memory architecture. In: Proceedings of the Conference on Design, Automation and Test in Europe. European Design and Automation Association, p 164Google Scholar
  17. 17.
    Farbeh H, Monazzah AMH, Aliagha E, Cheshmikhani E (2018) A-cache: alternating cache allocation to conduct higher endurance in NVM-based caches. IEEE Trans Circuits Syst II Express Briefs PP(99):1–5Google Scholar
  18. 18.
    Farbeh H, Mozafari F, Zabihi M, Miremadi SG (2017) Raw-tag: replicating in altered cache ways for correcting multiple-bit errors in tag array. IEEE Trans Depend Secure Comput.  https://doi.org/10.1109/TDSC.2017.2706263 Google Scholar
  19. 19.
    Fernandez-Pascual R, Ros A, Acacio ME (2017) To be silent or not: on the impact of evictions of clean data in cache-coherent multicores. J Supercomput 73(10):4428–4443CrossRefGoogle Scholar
  20. 20.
    Ghaemi SG, Hosseini Monazzah AM, Farbeh H, Miremadi SG (2015) LATED: lifetime-aware tag for enduring design. In: Proceedings of the IEEE International European Dependable Computing Conference (EDCC), pp 97–107Google Scholar
  21. 21.
    Henning JL (2006) SPEC CPU2006 benchmark descriptions. ACM SIGARCH Comput Archit News 34(4):1–17CrossRefGoogle Scholar
  22. 22.
    Hijaz F, Shi Q, Kurian G, Devadas S, Khan O (2016) Locality-aware data replication in the last-level cache for large scale multicores. J Supercomput 72(2):718–752CrossRefGoogle Scholar
  23. 23.
    Hong S, Lee J, Kim S (2014) Ternary cache: three-valued MLC STT-RAM caches. In: Proceedings of the IEEE International Conference on Computer Design (ICCD), pp 83–89Google Scholar
  24. 24.
    Jadidi A, Arjomand M, Sarbazi-Azad H (2011) High-endurance and performance-efficient design of hybrid cache architectures through adaptive line replacement. In: Proceedings of the International Symposium on Low-Power Electronics and Design (ISLPED), pp 79–84Google Scholar
  25. 25.
    Jaleel A, Theobald KB, Steely Jr SC, Emer J (2010) High performance cache replacement using re-reference interval prediction (RRIP). In: ACM SIGARCH Computer Architecture News, vol 38. ACM, pp 60–71Google Scholar
  26. 26.
    Jokar MR, Arjomand M, Sarbazi-Azad H (2016) Sequoia: a high-endurance NVM-based cache architecture. IEEE Trans Very Large Scale Integr (VLSI) Syst 24(3):954–967CrossRefGoogle Scholar
  27. 27.
    Joo Y, Niu D, Dong X, Sun G, Chang N, Xie Y (2010) Energy- and endurance-aware design of phase change memory caches. In: Proceedings of the Conference on Design, Automation and Test in Europe (DATE), pp 136–141Google Scholar
  28. 28.
    Joo Y, Park S (2013) A hybrid PRAM and SST-RAM cache architecture for extending lifetime of PRAM caches. IEEE Comput Archit Lett (CAL) 12(2):55–58CrossRefGoogle Scholar
  29. 29.
    Kang SH (2010) Embedded STT-MRAM for mobile applications: enabling advanced chip architectures. In: Non-Valotile Memories Workshop, UCSDGoogle Scholar
  30. 30.
    Lin C, Chiou JN (2015) High-endurance hybrid cache design in CMP architecture with cache partitioning and access-aware policies. IEEE Trans Very Large Scale Integr (VLSI) Syst 23(10):2149–2161CrossRefGoogle Scholar
  31. 31.
    Lin IC, Chiou JN (2015) High-endurance hybrid cache design in CMP architecture with cache partitioning and access-aware policies. IEEE Trans Very Large Scale Integr Syst (TVLSI) 23(10):2149–2161CrossRefGoogle Scholar
  32. 32.
    Mittal S, Vetter JS (2015) AYUSH: a technique for extending lifetime of SRAM–NVM hybrid caches. IEEE Comput Archit Lett (CAL) 14(2):115–118CrossRefGoogle Scholar
  33. 33.
    Mittal S, Vetter JS, Li D (2014) LastingNVCache: a technique for improving the lifetime of non-volatile caches. In: Proceedings of the International Symposium on VLSI (ISVLSI), pp 534–540Google Scholar
  34. 34.
    Monazzah AMH, Farbeh H, Miremadi SG (2017) Investigating the effects of process variations and system workloads on endurance of non-volatile caches. In: 2017 IEEE International Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems (DFT). IEEE, pp 1–6Google Scholar
  35. 35.
    Monazzah AMH, Farbeh H, Miremadi SG (2017) Optimas: overwrite purging through in-execution memory address snooping to improve lifetime of NVM-based scratchpad memories. IEEE Trans Device Mater Reliab 17(3):481–489CrossRefGoogle Scholar
  36. 36.
    Qureshi MK, Jaleel A, Patt YN, Steely SC, Emer J (2007) Adaptive insertion policies for high performance caching. In: ACM SIGARCH Computer Architecture News, vol 35. ACM, pp 381–391Google Scholar
  37. 37.
    Ramtake D, Kumarl S (2018) Performance analysis of first level cache memory replacement policies in multicore systems. Int J Eng Res Comput Sci Eng 5:505–511Google Scholar
  38. 38.
    Sun Z, Bi X, Wu W, Yoo S, Li HH (2016) Array organization and data management exploration in racetrack memory. IEEE Trans Comput (TC) 65(4):1041–1054MathSciNetCrossRefzbMATHGoogle Scholar
  39. 39.
    Sundriyal V, Sosonkina M (2016) Joint frequency scaling of processor and DRAM. J Supercomput 72(4):1549–1569CrossRefGoogle Scholar
  40. 40.
    UltraSPARC T (2006) Supplement to the ultrasparc architecture 2007Google Scholar
  41. 41.
    Wang J, Dong X, Xie Y (2014) Preventing STT-RAM last-level caches from port obstruction. ACM Trans Archit Code Optim (TACO) 11(3):23:1–23:19Google Scholar
  42. 42.
    Wang J, Dong X, Xie Y, Jouppi NP (2013) i$^2$WAP: improving non-volatile cache lifetime by reducing inter- and intra-set write variations. In: Proceedings of the International Symposium on High Performance Computer Architecture (HPCA), pp 234–245Google Scholar
  43. 43.
    Wang J, Dong X, Xie Y, Jouppi NP (2014) Endurance-aware cache line management for non-volatile caches. ACM Trans Archit Code Optim (TACO) 11(1):4:1–4:25Google Scholar
  44. 44.
    Wang S, Duan G, Li Y, Dong Q (2017) Word- and partition-level write variation reduction for improving non-volatile cache lifetime. ACM Trans Des Autom Electron Syst (TODAES) 23(1):4:1–4:18Google Scholar
  45. 45.
    Wen W, Zhang Y, Chen Y, Wang Y, Xie Y (2012) PS3-RAM: a fast portable and scalable statistical STT-RAM reliability analysis method. In: Proceedings of the Design Automation Conference (DAC), pp 1191–1196Google Scholar
  46. 46.
    Wu CJ, Jaleel A, Hasenplaugh W, Martonosi M, Steely Jr SC, Emer J (2011) Ship: signature-based hit predictor for high performance caching. In: Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture. ACM, New York, pp 430–441Google Scholar
  47. 47.
    Yan K, Peng L, Chen M, Fu X (2017) Exploring energy-efficient cache design in emerging mobile platforms. ACM Trans Des Autom Electron Syst (TODAES) 22(4):58:1–58:20Google Scholar
  48. 48.
    Yazdanshenas S, Ranjbar Pirbast M, Fazeli M, Patooghy A (2013) Coding last level STT-RAM cache for high endurance and low power. IEEE Comput Archit Lett (CAL) 13(2):73–76CrossRefGoogle Scholar
  49. 49.
    Young V, Chen C, Jaleel A, Qureshi M (2017) Ship++: enhancing signature-based hit predictor for improved cache performance. In: Proceedings of the Cache Replacement Championship (CRC17) Held in Conjunction with the International Symposium on Computer Architecture (ISCA17)Google Scholar
  50. 50.
    Zhou P, Zhao B, Yang J, Zhang Y (2009) Energy reduction for STT-RAM using early write termination. In: Proceedings of the International Conference on Computer-Aided Design (ICCAD), pp 264–268Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Sharif University of Technology (SUT)TehranIran
  2. 2.Tehran University (TU)TehranIran
  3. 3.Amirkabir University of TechnologyTehranIran
  4. 4.School of Computer ScienceInstitute for Research in Fundamental Sciences (IPM)TehranIran

Personalised recommendations