The Journal of Supercomputing

, Volume 73, Issue 6, pp 2402–2429 | Cite as

Multi-cache resizing via greedy coordinate descent

Article
  • 73 Downloads

Abstract

To reduce power consumption in CPUs, researchers have studied dynamic cache resizing. However, existing techniques only resize a single cache within a uniprocessor or the shared last-level cache (LLC) within a multi-core CPU. To maximize benefits, it is necessary to resize all caches, which in today’s CPUs includes one or two private caches per core and a shared LLC. Such multi-cache resizing (MCR) is challenging, because the multiple resizing decisions are coupled, yielding an enormous configuration space. In this paper, we present a dynamic MCR technique that uses search-based optimization. Our main contribution is a set of heuristics that enable the search to find the best configuration rapidly. In particular, our search moves in a coordinate descent (Manhattan) fashion across the configuration space. At each search step, we select the next cache for resizing greedily based on a power efficiency gain metric. To further enhance search speed, we permit parallel greedy selection. Across 60 multi-programmed workloads, our technique reduces power by 13.9% while sacrificing 1.5% of the performance.

Keywords

Cache resizing Multi-core CPUs Search-based optimization Power-efficient computing 

References

  1. 1.
    Albonesi DH (1999) Selective cache ways: on-demand cache resource allocation. In: Proceedings of the 32nd Annual International Symposium on Microarchitecture, pp 248–259Google Scholar
  2. 2.
    Bai R, Kim NS, Sylvester D, Mudge T (2005) Total leakage optimization strategies for multi-level caches. In: Proceedings of the 15th ACM Great Lakes Symposium on VLSI, Chicago, IL, pp 381–384Google Scholar
  3. 3.
    Balasubramonian R, Albonesi D, Buyuktosunoglu A, Dwarkadas S (2000) Dynamic memory hierarchy performance optimization. In: Proceedings of the Workshop on Solving the Memory Wall ProblemGoogle Scholar
  4. 4.
    Balasubramonian R, Albonesi DH, Buyuktosunoglu A, Dwarkadas S (2003) A dynamically tunable memory hierarchy. IEEE Trans Comput 52(10):1243–1258CrossRefGoogle Scholar
  5. 5.
    Burd TD, Pering TA, Stratakos AJ, Brodersen RW (2000) A dynamic voltage scaled microprocessor system. IEEE J Solid State Circuits 35(11):1571–1580CrossRefGoogle Scholar
  6. 6.
    Burger D, Austin TM (1997) The SimpleScalar Tool Set, Version 2.0. CS TR 1342, University of Wisconsin-MadisonGoogle Scholar
  7. 7.
    Bergman K, Borkar S, Campbell D, Carlson W, Dally W, Denneau M, Franzon P, Harrod W, Hill K, Hiller J, Karp S (2008) Exascale computing study: technology challenges in achieving exascale systems, Technical Report. Defense Advanced Research Projects Agency Information Processing Techniques Office (DARPA IPTO) 15Google Scholar
  8. 8.
    Chang J, Sohi GS (2007) Cooperative cache partitioning for chip multiprocessors. In: Proceedings of the International Conference on Supercomputing, Seattle, WAGoogle Scholar
  9. 9.
    Company HPD (2012) DDR3 memory technology. Hewlett-Packard Development Company, L.PGoogle Scholar
  10. 10.
    Dropsho S, Buyuktosunoglu A, Balasubramonian R, Albonesi DH, Dwarkadas S, Semeraro G, Magklis G, Scott ML (2002) Integrating adaptive on-chip storage structures for reduced dynamic power. In: Proceedings of 11th Annual International Conference on Parallel Architectures and Compilation TechniquesGoogle Scholar
  11. 11.
    EmuVM: AlphaVM-free, version 1.0.2 for Windows 7. http://www.emuvm.com/downloads.php
  12. 12.
    Flautner K, Kim NS, Martin S, Blaauw D, Mudge T (2002) Drowsy caches: simple techniques for reducing leakage power. In: Proceedings of the International Symposium on Computer Architecture, Anchorage, AKGoogle Scholar
  13. 13.
    Gordon-Ross A, Vahid F, Dutt N (2004) Automatic tuning of two-level caches to embedded applications. In: Proceedings of the Design, Automation and Test in Europe Conference and Exhibition (DATE 04)Google Scholar
  14. 14.
    Hamerly G, Perelman E, Lau J, Calder B (2005) Simpoint 3.0: faster and more flexible program analysis. In: Proceedings of the Workshop on Modeling, Benchmarking and SimulationGoogle Scholar
  15. 15.
    ITRS Working Group Models, MASTAR (2011). http://www.itrs.net/models.html
  16. 16.
    Jacob BL, Chen PM, Silverman SR, Mudge TN (1996) An analytical model for designing memory hierarchies. IEEE Trans Comput 45(10):1180–1194CrossRefMATHGoogle Scholar
  17. 17.
    Jeong J, Dubois M (2003) Cost-sensitive cache replacement algorithms. In: Proceedings of the 9th International Symposium on High-Performance Computer Architecture, HPCA ’03. IEEE Computer Society, Washington, DC, USA, pp 327–337Google Scholar
  18. 18.
    Kao J, Chandrakasan AP (2000) Dual-threshold voltage techniques for low-power digital circuit. IEEE J Solid State Circuits 35(7):1009–1018CrossRefGoogle Scholar
  19. 19.
    Kedzierski K, Cazorla FJ, Gioiosa R, Buyuktosunoglu A, Valero M (2010) Power and performance aware reconfigurable cache for CMPs. In: Proceedings of the Second International Forum on Next-Generation Multicore/Manycore Technologies, Saint-Malo, FranceGoogle Scholar
  20. 20.
    Kim C, Kim JJ, Mukhopadhyay S, Roy K (2005) A forward body-biased low-leakage SRAM cache: device, circuit and architecture considerations. IEEE Trans Very Large Scale Integr (VLSI) Syst 13(3):349–357CrossRefGoogle Scholar
  21. 21.
    Kim CH, Roy K (2002) Dynamic Vth scaling scheme for active leakage power reduction. In: Proceedings of the International Symposium on Design, Automation, and Test in Europe, pp 163–167Google Scholar
  22. 22.
    Kim NS, Blaauw D, Mudge T (2003) Leakage power optimization techniques for ultra deep sub-micron multi-level caches. In: Proceedings of the International Conference on Computer-Aided DesignGoogle Scholar
  23. 23.
    Kim NS, Flautner K, Blaauw D, Mudge T (2004) Circuit and microarchitectural techniques for reducing cache leakage power. IEEE Trans Very Large Scale Integr 12(2):167–184CrossRefGoogle Scholar
  24. 24.
    Kim S, Chandra D, Solihin Y (2004) Fair cache sharing and partitioning in a chip multiprocessor architecture. In: Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques, PACT ’04IEEE Computer Society, Washington, DC, USA, pp 111–122Google Scholar
  25. 25.
    Li S, Ahn JH, Strong RD, Brockman JB, Tullsen DM, Jouppi NP (2009) Mcpat: an integrated power, area, and timing modeling framework for multicore and manycore architectures. In: Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 42ACM, New York, NY, USA, pp 469–480Google Scholar
  26. 26.
    Liu W, Yeung D (2009) Using aggressor thread information to improve shared cache management for CMPs. In: Proceedings of the 2009 18th International Conference on Parallel Architectures and Compilation Techniques, PACT ’09IEEE Computer Society, Washington, DC, USA, pp 372–383Google Scholar
  27. 27.
    Madan N, Zhao L, naveen Muralimanohar, Udipi A, Balasubramonian R, Iyer R, Makineni S, Newell D (2009) Optimizing communication and capacity in a 3D stacked reconfigurable cache hierarchy. In: Proceedings of the International Symposium on High Performance Computer ArchitectureGoogle Scholar
  28. 28.
    Malik A, Moyer B, Cermak D (2000) A low power unified cache architecture providing power and performance flexibility. In: Proceedings of the International Symposium on Low Power Electronics and Design. Rapallo, ItalyGoogle Scholar
  29. 29.
    Muralimanohar N, Balasubramonian R, Jouppi N (2007) Optimizing nuca organizations and wiring alternatives for large caches with cacti 6.0. In: IEEE/ACM International Symposium on Microarchitecture. IEEE Computer Society, pp 3–14Google Scholar
  30. 30.
    Mutoh S, Douseki T, Matsuya Y, Aoki T, Shigematsu S, Yamada J (1995) 1-v power supply high-speed digital circuit technology with multithreshold-voltage cmos. IEEE J Solid State Circuits 30(8):847–854CrossRefGoogle Scholar
  31. 31.
    Nelder JA, Mead R (1965) A simplex method for function minimization. Comput J 7(4):308–313MathSciNetCrossRefMATHGoogle Scholar
  32. 32.
    Nesterov Y (2012) Efficiency of coordinate descent methods on huge-scale optimization problems. SIAM J Optim 22(2):341–362MathSciNetCrossRefMATHGoogle Scholar
  33. 33.
    Nii K, Makino H, Tujihashi Y, Morishima C, Hayakawa Y, Nunogami H, Arakawa T, Hamano H (1998) A low power SRAM using auto-backgate-controlled MT-CMOS. In: Proceedings of the International Symposium on Low-Power Electronics and Design, Monterey, CA, pp 293–298Google Scholar
  34. 34.
    Powell M, Yang SH, Falsafi B, Roy K, Vijaykumar TN (2000) Gated-Vdd: a circuit technique to reduce leakage in deep-submicron cache memories. In: Proceedings of the IEEE/ACM International Symposium on Low Power Electronics & Design, pp 90–95Google Scholar
  35. 35.
    Qureshi MK, Patt YN (2006) Utility-based cache partitioning: a low-overhead, high-performance, runtime mechanism to partition shared caches. In: Proceedings of the International Symposium on MicroarchitectureGoogle Scholar
  36. 36.
    Shukla N, Singh R, Pattanaik M (2011) Design and analysis of a novel low-power SRAM bit-cell structure at deep-sub-micron CMOS technology for mobile multimedia applications. (IJACSA) Int J Adv Comput Sci Appl 2(5):43–49Google Scholar
  37. 37.
    Silva-Filho AG, Cordeiro FR (2010) A combined optimization method for tuning two-level memory hierarcnhy considering energy consumption. EURASIP J Embed Syst 2011:1CrossRefGoogle Scholar
  38. 38.
    Suh GE, Devadas S, Rudolph L (2002) A new memory monitoring scheme for memory-aware scheduling and partitioning. In: Proceedings of the International Symposium on High Performance Computer ArchitectureGoogle Scholar
  39. 39.
    Suh GE, Rudolph L, Devadas S (2004) Dynamic partitioning of shared cache memory. J Supercomput 28:7–26CrossRefMATHGoogle Scholar
  40. 40.
    Sundararajan KT, Porpodas V, Jones TM, Topham MP, Franke B (2012)Cooperative partitioning: energy-efficient cache partitioning for high-performance CMPs. In: Proceedings of the 18th International Symposium on High-Performance Computer Architecture, New Orleans, LA, pp 311–322Google Scholar
  41. 41.
    Tschanz J, Narendra S, Ye Y, Bloechel B, Borkar S, De V (2003) Dynamic sleep transistor and body bias for active leakage power control of microprocessors. IEEE J Solid State Circuits 38(11):1838–1845CrossRefGoogle Scholar
  42. 42.
    Tseng P (1993) Dual coordinate ascent methods for non-strictly convex minimization. Math Program 59:231–247MathSciNetCrossRefMATHGoogle Scholar
  43. 43.
    Varadarajan K, Nandy SK, Sharda V, Bharadwaj A (2006) Molecular caches: a caching structure for dynamic creation of application-specific heterogeneous cache regions. In: Proceedings of the International Symposium on MicroarchitectureGoogle Scholar
  44. 44.
    Wang W, Mishra P, Ranka S (2011) Dynamic cache reconfiguration and partitioning for energy optimization in real-time multi-core systems. In: Proceedings of the 48th Design Automation Conference, DAC ’11ACM, New York, NY, USA, pp 948–953Google Scholar
  45. 45.
    Wei GY, Horowitz M (1999) A fully digital, energy-efficient, adaptive power-supply regulator. IEEE J Solid State Circuits 34(4):520–528CrossRefGoogle Scholar
  46. 46.
    Yang SH, Falsafi B, Powell MD, Vijaykumar TN (2002) Exploiting choice in resizable cache design to optimize deep-submicron processor energy-delay. In: Proceedings of the 8th International Symposium on High-Performance Computer Architecture, HPCA ’02IEEE Computer Society, Washington, DC, USA, pp 151–161Google Scholar
  47. 47.
    Yang SH, Powell MD, Falsafi B, Roy K, Vijaykumar TN (2001) An integrated circuit/architecture approach to reducing leakage in deep-submicron high-performance I-caches. In: Proceedings of the 7th International Symposium on High-Performance Computer ArchitectureGoogle Scholar
  48. 48.
    Zhang C, Vahid F (2003) Cache configuration exploration on prototyping platforms. In: Proceedings of the 14th International Workshop on Rapid Systems PrototypingGoogle Scholar
  49. 49.
    Zhang C, Vahid F, Najjar W (2003) A highly configurable cache architecture for embedded systems. In: Proceedings of the 30th International Symposium on Computer Architecture, San Diego, CAGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2016

Authors and Affiliations

  1. 1.SamsungSan JoseUSA
  2. 2.University of Maryland at College ParkCollege ParkUSA

Personalised recommendations