A Cache-Partitioning Aware Replacement Policy for Chip Multiprocessors

  • Haakon Dybdahl
  • Per Stenström
  • Lasse Natvig
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4297)


Chip multiprocessors (CMPs) usually employ shared, last-level caches to use on-chip memory resources effectively. Unfortunately, conventional replacement policies applied to shared caches fail to partition memory resources among cores to achieve an optimal execution throughput. This paper presents a novel replacement policy that dynamically estimates how many misses would be eliminated if one more block per set would be allocated to a certain processor taking into account the extra misses for some other processor. Our implementation makes novel use of shadow tags for the estimation. We show that it can yield 50% higher execution throughput on a 4-way CMP and in contrast to previously proposed schemes, we did not observe any noticeable degradation of performance for any application in the SPEC2000 we used.


Cache Size Replacement Policy Cache Line Cache Block Cache Space 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Kim, S., Chandra, D., Solihin, Y.: Fair cache sharing and partitioning on a chip multiprocessor architecture. In: PACT (2004)Google Scholar
  2. 2.
    Suh, G., Devadas, S., Rudolph, L.: Dynamic cache partitioning for simultaneous multithreading systems. IASTED Parallel and Dist. Computing Systems (2001)Google Scholar
  3. 3.
    Suh, G.E., Devadas, S., Rudolph, L.: A new memory monitoring scheme for memory-aware scheduling and partitioning. In: HPCA (2002)Google Scholar
  4. 4.
    Suh, G.E., Devadas, S., Rudolph, L.: Dynamic partitioning of shared cache memory. The Journal of Supercomputing 28(1) (2004)Google Scholar
  5. 5.
    Austin, T., Larson, E., Ernst, D.: SimpleScalar: An infrastructure for computer system modeling. IEEE Computer 35(2) (2002)Google Scholar
  6. 6.
    Smith, J.E.: Characterizing computer performance with a single number. Communications of the ACM 31(10), 1202–1206 (1988)CrossRefGoogle Scholar
  7. 7.
    Dybdahl, H., Stenström, P.: Enhancing lower level cache performance by early miss determination and block bypassing. In: ICCD (submitted, 2006)Google Scholar
  8. 8.
    Chishti, Z., Powell, M.D., Vijaykumar, T.N.: Optimizing replication, communication, and capacity allocation in CMPs. SIGARCH Comput. Arc. News 33(2) (2005)Google Scholar
  9. 9.
    Zhang, M., Asanovic, K.: Victim replication: Maximizing capacity while hiding wire delay in tiled chip multiprocessors. In: ISCA (2005)Google Scholar
  10. 10.
    Chang, J., Sohi, G.S.: Cooperative caching for chip multiprocessors. In: ISCA (2006)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Haakon Dybdahl
    • 1
  • Per Stenström
    • 2
  • Lasse Natvig
    • 1
  1. 1.Dept. of Computer and Information ScienceNorwegian University of Science and TechnologyTrondheimNorway
  2. 2.Dept. of Computer Engineering, Dept. of Computer EngineeringChalmers University of TechnologyGoteborgSweden

Personalised recommendations