DIEF: An Accurate Interference Feedback Mechanism for Chip Multiprocessor Memory Systems

  • Magnus Jahre
  • Marius Grannaes
  • Lasse Natvig
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5952)


Chip Multi-Processors (CMPs) commonly share hardware-controlled on-chip units that are unaware that memory requests are issued by independent processors. Consequently, the resources a process receives will vary depending on the behavior of the processes it is co-scheduled with. Resource allocation techniques can avoid this problem if they are provided with an accurate interference estimate. Our Dynamic Interference Estimation Framework (DIEF) provides this service by dynamically estimating the latency a process would experience with exclusive access to all hardware-controlled, shared resources. Since the total interference latency is the sum of the interference latency in each shared unit, the system designer can choose estimation techniques to achieve the desired accuracy/complexity trade-off. In this work, we provide high-accuracy estimation techniques for the on-chip interconnect, shared cache and memory bus. This DIEF implementation has an average relative estimate error between -0.4% and 4.7% and a standard deviation between 2.4% and 5.8%.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Mutlu, O., Moscibroda, T.: Stall-Time Fair Memory Access Scheduling for Chip Multiprocessors. In: MICRO 40: Int. Symp. on Microarchitecture (2007)Google Scholar
  2. 2.
    Qureshi, M.K., Patt, Y.N.: Utility-Based Cache Partitioning: A Low-Overhead, High-Performance, Runtime Mechanism to Partition Shared Caches. In: MICRO 39: Proc. of the 39th An. IEEE/ACM Int. Symp. on Microarch., pp. 423–432 (2006)Google Scholar
  3. 3.
    Kim, S., Chandra, D., Solihin, Y.: Fair Cache Sharing and Partitioning in a Chip Multiprocessor Architecture. In: PACT 2004: Proc. of the 13th Int. Conf. on Parallel Architectures and Compilation Techniques, pp. 111–122 (2004)Google Scholar
  4. 4.
    Chang, J., Sohi, G.S.: Cooperative Cache Partitioning for Chip Multiprocessors. In: ICS 2007: Proc. of the 21st Annual Int. Conf. on Supercomputing, pp. 242–252 (2007)Google Scholar
  5. 5.
    Nesbit, K., Moreto, M., Cazorla, F., Ramirez, A., Valero, M., Smith, J.: Multicore Resource Management. IEEE Micro 28(3), 6–16 (2008)CrossRefGoogle Scholar
  6. 6.
    Sprunt, B.: The Basics of Performance-Monitoring Hardware. IEEE Micro 22(4), 64–71 (2002)CrossRefGoogle Scholar
  7. 7.
    Eyerman, S., Eeckhout, L.: System-Level Performance Metrics for Multiprogram Workloads. IEEE Micro 28(3), 42–53 (2008)CrossRefGoogle Scholar
  8. 8.
    Mutlu, O., Moscibroda, T.: Parallelism-Aware Batch Scheduling: Enhancing both Performance and Fairness of Shared DRAM Systems. In: ISCA 2008: Proc. of the 35th An. Int. Symp. on Comp. Arch., pp. 63–74 (2008)Google Scholar
  9. 9.
    Rixner, S., Dally, W.J., Kapasi, U.J., Mattson, P., Owens, J.D.: Memory Access Scheduling. In: ISCA 2000: Int. Symp. on Comp. Arch., pp. 128–138 (2000)Google Scholar
  10. 10.
    Jahre, M., Grannaes, M., Natvig, L.: A Quantitative Study of Memory System Interference in Chip Multiprocessor Architectures. In: HPCC 2009: 11th IEEE Int. Conf. on High Performance Computing and Communications, pp. 622–629 (2009)Google Scholar
  11. 11.
    Qureshi, M.K., Lynch, D.N., Mutlu, O., Patt, Y.N.: A Case for MLP-Aware Cache Replacement. In: ISCA 2006: Int. Symp. on Comp. Arch., pp. 167–178 (2006)Google Scholar
  12. 12.
    Dybdahl, H., Stenstrom, P., Natvig, L.: An LRU-based Replacement Algorithm Augmented with Frequency of Access in Shared Chip-Multiprocessor Caches. In: MEDEA 2006: Proc. of the 2006 workshop on MEmory performance, pp. 45–52 (2006)Google Scholar
  13. 13.
    Thoziyoor, S., Muralimanohar, N., Ahn, J.H., Jouppi, N.P.: CACI 5.1. Technical report, HP Laboratories Palo Alto (2008)Google Scholar
  14. 14.
    Binkert, N.L., Dreslinski, R.G., Hsu, L.R., Lim, K.T., Saidi, A.G., Reinhardt, S.K.: The M5 Simulator: Modeling Networked Systems. IEEE Micro 26(4), 52–60 (2006)CrossRefGoogle Scholar
  15. 15.
    JEDEC Solid State Tech. Association: DDR2 SDRAM Specification (May 2006)Google Scholar
  16. 16.
    SPEC: SPEC CPU (2000), http://www.spec.org/cpu2000/
  17. 17.
    Zhao, L., Iyer, R., Illikkal, R., Moses, J., Makineni, S., Newell, D.: CacheScouts: Fine-Grain Monitoring of Shared Caches in CMP Platforms. In: PACT 2007: Proc. of the 16th Int. Conf. on Parallel Arch. and Comp. Tech., pp. 339–352 (2007)Google Scholar
  18. 18.
    Nesbit, K.J., Laudon, J., Smith, J.E.: Virtual private caches. In: ISCA 2007: Proc. of the 34th An. Int. Symp. on Comp. Arch., pp. 57–68 (2007)Google Scholar
  19. 19.
    Nesbit, K.J., Aggarwal, N., Laudon, J., Smith, J.E.: Fair Queuing Memory Systems. In: MICRO 39: Int. Symp. on Microarchitecture, pp. 208–222 (2006)Google Scholar
  20. 20.
    Iyer, R., Zhao, L., Guo, F., Illikkal, R., Makineni, S., Newell, D., Solihin, Y., Hsu, L., Reinhardt, S.: QoS Policies and Architecture for Cache/Memory in CMP Platforms. In: SIGMETRICS 2007, pp. 25–36 (2007)Google Scholar
  21. 21.
    Bitirgen, R., Ipek, E., Martinez, J.F.: Coordinated Management of Multiple Resources in Chip Multiprocessors: A Machine Learning Approach. In: MICRO 41: Proc. of the 41th IEEE/ACM Int. Symp. on Microarchitecture (2008)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Magnus Jahre
    • 1
  • Marius Grannaes
    • 1
  • Lasse Natvig
    • 1
  1. 1.Norwegian University of Science and Technology 

Personalised recommendations