Improving Power Efficiency with an Asymmetric Set-Associative Cache

  • Zhigang Hu
  • Stefanos Kaxiras
  • Margaret Martonosi

Abstract

Data caches are widely used in general-purpose processors as a means to hide long memory latencies. Set-associativity in these caches helps programs avoid performance problems due to cache-mapping conflicts. Current set-associative caches are symmetric in the sense that each way has the same number of cache lines. Moreover, each way is searched in parallel so energy is consumed by all ways even though at most one way will hit. With this in mind, this chapter proposes an asymmetric cache structure in which the size of each way can be different. The ways of the cache are different powers of two and allow for a “tree-structured” cache in which extra associativity can be shared. We accomplish this by having two cache blocks from the larger ways align with individual cache blocks in the smaller ways. This structure achieves performance comparable to a conventional cache of similar size and equal associativity. Most notably, the asymmetric cache has the nice property that accesses hit in the smaller ways can immediately terminate accesses to larger ways so that power can be saved. For the SPEC2000 benchmarks, we found cache energy per access was reduced by as much as 23% on average. The characteristics of the asymmetric set-associative design (lowpower, uncompromised performance, compact layout) make them particularly attractive for low-power processors.

Keywords

Migration 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Agarwal A and Pudar S (1992) Column-associative caches: A technique for reducing the miss rate of direct-mapped caches. In: Proceedings of the 20th Annual International Symposium on Computer Architecture.Google Scholar
  2. 2.
    Bodin F and Seznec A (1995) Skewed associativity enhances performance predictability. In: Proceedings of the 22nd Annual International Symposium on Computer Architecture.Google Scholar
  3. 3.
    Burger D, Austin T, and Bennett S (1996) Evaluating future microprocessors: The SimpleScalar tool set. Technical Report TR-1308, University of Wisconsin-Madison Computer Sciences Department.Google Scholar
  4. 4.
    Calder B, Grunwald D, and Emer J (1996) Predictive sequential associative cache. In: Proceedings of the 2nd Annual International Symposium on High Performance Computer Architecture.Google Scholar
  5. 5.
    Chang J H, Chao J, and So K (1987) Cache design of a sub-micron CMOS system370. In: Proceedings of the 14th Annual International Symposium on Computer Architecture.Google Scholar
  6. 6.
    SPEC Corporation (2000) WWW site http://www.spec.org.
  7. 7.
    Diodato P (2001) PersOnal communication.Google Scholar
  8. 8.
    Gwennap, L (1996) Digital 21264 sets new standard. Microprocessor Report, October 1996, pp. 11–16.Google Scholar
  9. 9.
    Hill M and Smith A (1989) Evaluating associativity in CPU caches. IEEE Transactions on Computers (38)12:1612–1630.CrossRefGoogle Scholar
  10. 10.
    Inoue K, Ishihara T, and Murakami K (1999) Way-predicting set-associative cache for high performance and low energy consumption. In: Proceedings of the 1999 International Symposium on Low Power Electronics and Design.Google Scholar
  11. 11.
    Johnson M and Mangione-Smith W. (1997) The filter cache: An energy efficient memory structure. In: Proceedings of the 30th Annual International Symposium on Microarchitecture.Google Scholar
  12. 12.
    Jouppi N (1990) Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers. In: Proceedings of the 17th Annual International Symposium on Computer Architecture.Google Scholar
  13. 13.
    Juan T, Lang T, and Navarro J (1996) The difference-bit cache. In: Proceedings of the 23rd Annual International Symposium on Computer Architecture.Google Scholar
  14. 14.
    Kaxiras S, Hu Z, and Martonosi M (2001) Cache decay: exploiting generational behavior to reduce cache leakage power. In: Proceedings of the 28th Annual International Symposium on Computer Architecture.Google Scholar
  15. 15.
    Kessler R, Jooss R, Lebeck A, and Hill M (1989) Inexpensive implementation of set-associativity. In: Proceedings of the 16th Annual International Symposium on Computer Architecture, pp. 131–139.Google Scholar
  16. 16.
    Villa L, Zhang M, and Asanovic K (2000) Dynamic zero compression for cache energy reduction. In: Proceedings of the 33rd Annual IEEE/ACM International Symposium on Microarchitecture.Google Scholar
  17. 17.
    Milutinovic V, Markovic B and Tremblay M (1996) The split temporal/spacial cache: initial performance analysis. In: Proceedings of the SCIzzL-5.Google Scholar
  18. 18.
    Peir J, Lee Y, and Hsu W (1998) Capturing dynamic memory reference behavior with adaptive cache topology. In: Proceedings of the Eighth International Conference on Architectural Support for Programming Languages and Operating Systems.Google Scholar
  19. 19.
    Prvulovic M, Marinov D, Dimitrijevic Z and Milutinovic C (1999) The split spatial/non-spacial cache: a performance and complexity analysis. In: IEEE TCCA Newsletter.Google Scholar
  20. 20.
    Sahuquillo J and Pont A (2000) Splitting the data cache: a survey. IEEE Concurrency 8(3):30–35.CrossRefGoogle Scholar
  21. 21.
    Seznec A (1993) A case for two-way skewed-associative caches. In: Proceedings of the 20th Annual International Symposium on Computer Architecture, pp. 169–178.Google Scholar
  22. 22.
    Seznec A (1995) DASC cache. In: Proceedings of the 1st Annual International Symposium on High Performance Computer Architecture.Google Scholar
  23. 23.
    Shivakumar P, and Jouppi N (2001) Cacti 3.0: An integrated cache timing, power, and area model. Technical Report 2001/2, Compaq Western Research Lab.Google Scholar
  24. 24.
    Tremblay M and O’Connor J (1996) UltraSparcI: A four-issue processor supporting multimedia. IEEE Micro (16)2:42–50.CrossRefGoogle Scholar
  25. 25.
    Wilton S and Jouppi N (1994) An enhanced access and cycle time model for on-chip caches TR 1993/5, Compaq Western Research Lab.Google Scholar
  26. 26.
    Yeager K. (1996) The MIPS R10000 superscalar microprocessor. IEEE Micro (16)2:28–40.CrossRefGoogle Scholar
  27. 27.
    Zhang C, Zhang X, Yan Y (1997) Two fast and high-associativity cache schemes. IEEE Micro (17)5:40–49.CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2004

Authors and Affiliations

  • Zhigang Hu
    • 1
  • Stefanos Kaxiras
    • 2
  • Margaret Martonosi
    • 3
  1. 1.T.J. Watson Research CenterArmonkUSA
  2. 2.Communication System and Software, Agere SystemsAllentownUSA
  3. 3.Department of Electrical EngineeringPrinceton UniversityPrincetonUSA

Personalised recommendations