Advertisement

Approximate Cache Architectures

  • Natalie Enright Jerger
  • Joshua San Miguel
Chapter

Abstract

In this chapter, we explore the application of approximate computing techniques to caches and the memory access portion of the processor pipeline. As memory accesses contribute significantly to the latency and energy consumption of applications, they have long been the target of various optimizations. Large cache hierarchies are a mainstay in modern designs in order to avoid the long latency and high energy associated with accessing DRAM on every load or store request. With growing data set sizes, building ever larger caches is not necessarily an effective use of silicon real estate. We present recent work that improves the effectiveness of cache storage and reduces the cost of memory accesses by exploiting the inherently noisy or imprecise data that these applications operate on. First, we consider work that selectively forgoes loading data from the caches and memory when the processor can make a reasonable estimate of the value that is needed. Next, we explore work that selectively determines which values to store in the cache through approximate deduplication of data; by reducing how much data needs to be stored in the cache, we see an increase in the effective cache capacity.

References

  1. 1.
    Alameldeen A, Wood DA (2004) Adaptive cache compression for high-performance processors. In: International symposium on computer architectureGoogle Scholar
  2. 2.
    Albericio J, Ibanez P, Vinals V, Llaberia JM (2013) The reuse cache: downsizing the shared last-level cache. In: Proceedings of the international symposium on microarchitectureGoogle Scholar
  3. 3.
    Alvarez C, Corbal J, Valero M (2005) Fuzzy memoization for floating-point multimedia applications. IEEE Trans Comput 54:922–927CrossRefGoogle Scholar
  4. 4.
    Biswas S, Franklin D, Savage A, Dixon R, Sherwood T, Chong F (2009) Multi-execution: multicore caching for data-similar executions. In: Proceedings of the international symposium on computer architectureGoogle Scholar
  5. 5.
    Burtscher M (2000) Improving context-based load value prediction. PhD Thesis, University of ColoradoGoogle Scholar
  6. 6.
    Ceze L, Strauss K, Tuck J, Torrellas J, Renau J (2006) CAVA: using checkpoint-assisted value prediction to hide L2 misses. ACM Trans Archit Code Optim 3:182–208CrossRefGoogle Scholar
  7. 7.
    Chen X, Yang L, Dick RP, Shang L, Lekatsas H (2010) C-pack: a high-performance microprocessor cache compression algorithm. IEEE Trans Very Large Scale Integr 18:8Google Scholar
  8. 8.
    Falsafi B, Wenisch T (2014) A Primer on hardware prefetching. Morgan Claypool, San RafaelCrossRefGoogle Scholar
  9. 9.
    Fluhr E, Friedrich J, Dreps D, Zyuban V, Still G, Gonzalez C, Hall A, Hogenmiller D, Malgioglio F, Nett R, Paredes J, Pille J, Plass D, Puri R, Restle P, Shan D, Stawiasz K, Deniz ZT, Wendel D, Ziegler M (2014) POWER8TM: a 12-core server-class processor in 22nm SOI with 7.6tb/s off-chip bandwidth. In: Proceedings of the international solid state circuits conferenceGoogle Scholar
  10. 10.
    Gabbay F (1996) Speculative execution based on value prediction. EE Department Technical Report 1080, Technion - Israel Institute of TechnologyGoogle Scholar
  11. 11.
    Hallnor E, Reinhardt S (2005) A unified compressed memory hierarchy. In: Proceedings of the international symposium on high performance computer architectureGoogle Scholar
  12. 12.
    Hammarlund P, Martinez A, Bajwa A, Hill D, Hallnor E, Jiang H, Dixon M, Derr M, Hunsaker M, Kumar R, Osborne R, Rajwar R, Singhal R, D’Sa R, Chappell R, Kaushik S, Chennupaty S, Jourdan S, Gunther S, Piazza T, Burton T (2014) Haswell: the fourth-generation intel core processor. IEEE Micro 34:2CrossRefGoogle Scholar
  13. 13.
    Jaleel A, Theobald KB, Steely SC Jr, Emer J (2010) High performance cache replacement using re-reference interval prediction (RRIP). In: proceedings of the 38th international symposium on computer architectureGoogle Scholar
  14. 14.
    Khan SM, Tian Y, Jiménez DA (2010) Dead block replacement and bypass with a sampling predictor. In: Proceedings of the 43rd international symposium on microarchitectureGoogle Scholar
  15. 15.
    Kharbutli M, Irwin K, Solihin Y, Lee J (2004) Using prime numbers for cache indexing to eliminate conflict misses. In: HPCAGoogle Scholar
  16. 16.
    Kleanthous M, Sazeides Y (2008) CATCH: a mechanism for dynamically detecting cache-content-duplication and its application to instruction caches. In: Proceedings of the conference on design automation and test in EuropeGoogle Scholar
  17. 17.
    Lipasti MH, Wilkerson CB, Shen JP (1996) Value locality and load value prediction. In: Proceedings of the international conference architectural support for programming languages and operating systemsGoogle Scholar
  18. 18.
    Liu S, Gaudiot J (2009) Potential impact of value prediction on communication in many-core architectures. IEEE Trans Comput 58:759–769MathSciNetCrossRefGoogle Scholar
  19. 19.
    Martin MMK, Sorin DJ, Cain HW, Hill MD, Lipasti MH (2001) Correctly implementing value prediction in microprocessors that support multithreading or multiprocessing. In: Proceedings of the international symposium on microarchitectureGoogle Scholar
  20. 20.
    Nakra T, Gupta R, Soffa ML (1999) Global context-based value prediction. In: Proceedings of the international symposium high-performance computer architectureGoogle Scholar
  21. 21.
    Pekhimenko G, Seshadr V, Mutlu O, Kozuch M, Gibbons PB, Mowry TC (2012) Base-delta-immediate compression: Practical data compression for on-chip caches. In: Proceedings of the international conference on parallel architecture and compilation techniquesGoogle Scholar
  22. 22.
    Qureshi MK, Jaleel A, Patt YN, Steely SC Jr, Emer J (2007) Adaptive insertion policies for high performance caching. In: Proceedings of the 34th international symposium on computer architectureGoogle Scholar
  23. 23.
    San Miguel J, Badr M, Enright Jerger N (2014) Load value approximation. In: International symposium on microarchitectureGoogle Scholar
  24. 24.
    San Miguel J, Albericio J, Moshovos A, Enright Jerger N (2015) Doppelganger: a cache for approximate computing. In: MICROGoogle Scholar
  25. 25.
    San Miguel J, Albericio J, Enright Jerger N, Jaleel A (2016) The bunker cache for spatio-value approximation. In: MICROGoogle Scholar
  26. 26.
    Sardashti S, Wood DA (2013) Decoupled compressed cache: exploiting spatial locality for energy-optimized compressed caching. In: International symposium on microarchitectureGoogle Scholar
  27. 27.
    Sardashti S, Seznec A, Wood DA (2014) Skewed compressed cache. In: International symposium on microarchitectureGoogle Scholar
  28. 28.
    Sazeides Y, Smith J (1997) The predictability of data values. In: Proceedings of the international symposium microarchitectureGoogle Scholar
  29. 29.
    Sendag R, Chuang P-F, Lilja D (2003) Address correlation: exceeding the limits of locality. IEEE Comput Archit Lett 2:3–3CrossRefGoogle Scholar
  30. 30.
    Seznec A (1993) A case for two-way skewed-associative caches. In: Proceedings of the international symposium computer architectureGoogle Scholar
  31. 31.
    Sreeram J, Pande S (2010) Exploiting approximate value locality for data synchronization on multi-core processors. In: Proceedings of the international symposium workload characterizationGoogle Scholar
  32. 32.
    Thwaites B, Pekhimenko G, Esmaeilzadeh H, Yazdanbakhsh A, Mutlu O, Park J, Mururu G, Mowry T (2014) Rollback-free value prediction with approximate loads. Poster presented at PACTGoogle Scholar
  33. 33.
    Tian Y, Khan S, Jimenez D, Loh G (2014) Last-level cache deduplication. In: Proceedings of the international conference on supercomputingGoogle Scholar
  34. 34.
    Tong JYF, Nagle D, Rutenbar RA (2000) Reducing power by optimizing the necessary precision/range of floating-point arithmetic. IEEE Trans Very Large Scale Integr Syst 8:273–286CrossRefGoogle Scholar
  35. 35.
    Wong D, Kim NS, Annavaram M (2016) Approximating warps with intra-warp operand value similarity. In: Proceedings of the international symposium on high performance computer architectureGoogle Scholar
  36. 36.
    Wu CJ, Jaleel A, Martonosi M, Steely S Jr, Emer J (2011) PACMan: prefetch-aware cache management for high performance caching. In: Proceedings of the international symposium on microarchitectureGoogle Scholar
  37. 37.
    Yazdanbakhsh A, Pekhimenko G, Thwaites B, Esmaeilzadeh H, Mutlu O, Mowry TC (2016) RFVP: rollback-free value prediction with safe-to-approximate loads. ACM Trans Archit Code Optim 12:4CrossRefGoogle Scholar
  38. 38.
    Zhang Y, Yang J, Gupta R (2000) Frequent value locality and value-centric data cache design. ACM SIGOPS Oper Syst Rev 34:150–159CrossRefGoogle Scholar
  39. 39.
    Zhou H, Flanagan J, Conte TM (2003) Detecting global stride locality in value streams. In: Proceedings of the international symposium computer architectureGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.University of TorontoTorontoCanada
  2. 2.University of Wisconsin-MadisonMadisonUSA

Personalised recommendations