Natural Computing

, Volume 12, Issue 3, pp 411–428 | Cite as

On modeling contention for shared caches in multi-core processors with techniques from ecology

Article

Abstract

Multi-core x86_64 processors introduced an important change in architecture, a shared last level cache. Historically, each processor has had access to a large private cache that seamlessly and transparently (to end users) interfaced with main memory. Previously, processes or threads only had to compete for memory bandwidth, but now they are competing for actual space. Competition for space and environmental resources is a problem studied in other scientific domains. This paper introduces methods from ecology to model multi-core cache usage with the competitive Lotka–Volterra equations. A model is presented and validated for characterizing the interaction of cores through shared caching, and for predicting the degree to which different workloads will interfere with each others’ execution from cache contention.

Keywords

Performance modeling Multi-core cache Shared cache Performance profiling Lotka–Volterra 

References

  1. Agarwal A (1992) Performance tradeoffs in multithreaded processors. IEEE Trans Parallel Distrib Syst 3(5):525–539CrossRefGoogle Scholar
  2. Agarwal A, Hennessy J, Horowitz M (1989) An analytical cache model. ACM Trans Comput Syst 7:184–215CrossRefGoogle Scholar
  3. Aho AV, Denning PJ, Ullman JD (1971) Principles of optimal page replacement. J ACM 18:80–93MathSciNetMATHCrossRefGoogle Scholar
  4. Antoniou S, Lambropoulou S (2008) Dynamical systems and topological surgery. ArXiv e-printsGoogle Scholar
  5. Berryman AA (1992) The origins and evolution of predator–prey theory. Ecol Freshw Fish 73:1520–1535CrossRefGoogle Scholar
  6. Boyd-Wickizer S, Morris R, Kaashoek MF (2009) Reinventing scheduling for multicore systems. In: Proceedings of the 12th conference on Hot topics in operating systems, HotOS’09. USENIX Association, Berkeley, CA, p 21Google Scholar
  7. Capitn JA, Cuesta JA (2010) Species assembly in model ecosystems, I: analysis of the population model and the invasion dynamics. J Theor Biol 269(1):330–343Google Scholar
  8. Chandra D, Guo F, Kim S, Solihin Y (2005) Predicting inter-thread cache contention on a chip multi-processor architecture. In: Proceedings of the 11th international symposium on high-performance computer architecture. IEEE Computer Society, Washington, pp 340–351Google Scholar
  9. Emeneker W, Apon A (2010) Cache effects of virtual machine placement on multi-core processors. International conference on computer and information technology, pp 2261–2266Google Scholar
  10. Emeneker W, Apon A (2012) Characterising the performance of cache-aware placement of virtual machines on a multi-core architecture. Int J Ad Hoc Ubiquitous Comput 10(2):84–95Google Scholar
  11. Fedorova A, Seltzer M, Smith MD (2007) Improving performance isolation on chip multiprocessors via an operating system scheduler. In: Proceedings of the 16th international conference on parallel architecture and compilation techniques, PACT ’07. IEEE Computer Society, Washington, pp 25–38Google Scholar
  12. Harper JS, Kerbyson DJ, Nudd GR (1999) Analytical modeling of set-associative cache behavior. IEEE Trans Comput 48:1009–1024CrossRefGoogle Scholar
  13. Hou Z (2007) Global attractor in competitive Lotka–Volterra systems with retardation. ArXiv e-printsGoogle Scholar
  14. Hunter JD (2007) Matplotlib: a 2D graphics environment. Comput Sci Eng 9(3):90–95CrossRefGoogle Scholar
  15. Jiang Y, Tian K, Shen X (2010) Combining locality analysis with online proactive job co-scheduling in chip multiprocessors. In Patt Y, Foglia P, Duesterwald E, Faraboschi P, Martorell X (eds) High performance embedded architectures and compilers, vol 5952 of lecture notes in computer science. Springer, Berlin, pp 201–215Google Scholar
  16. Jones E, Oliphant T, Peterson P et al (2001) SciPy: open source scientific tools for Python (online)Google Scholar
  17. Jost C, Devulder G, Peterson RO, Arditi R (2005) The wolves of Isle Royale display scale-invariant satiation and ratio-dependent predation on moose. J Anim Ecol 74(5):809–816CrossRefGoogle Scholar
  18. Kaplan SF, McGeoch LA, Cole MF (2002) Adaptive caching for demand prepaging. SIGPLAN Not 38:114–126CrossRefGoogle Scholar
  19. Kaseridis D, Stuecheli J, John LK (2009) Bank-aware dynamic cache partitioning for multicore architectures. In: International conference on parallel processing, pp 18–25Google Scholar
  20. Kessler RE, Hill MD (1992) Page placement algorithms for large real-indexed caches. ACM Trans Comput Syst 10:338–359CrossRefGoogle Scholar
  21. Levon J, Elie P (2008) Oprofile: a system-wide Profiler for Linux Systems. http://oprofile.sourceforge.net
  22. Lin J, Lu Q, Ding X, Zhang Z, Zhang X, Sadayappan P (2008) Gaining insights into multicore cache partitioning: bridging the gap between simulation and real systems. In: IEEE 14th international symposium on high performance computer architecture, 2008. HPCA 2008, pp 367–378Google Scholar
  23. Malcai O, Biham O, Richmond P, Solomon S (2002) Theoretical analysis and simulations of the generalized Lotka–Volterra model. Phys Rev E 66(3):031102/1–031102/4Google Scholar
  24. Nethercote N, Seward J (2007) Valgrind: a framework for heavyweight dynamic binary instrumentation. SIGPLAN Not 42:89–100CrossRefGoogle Scholar
  25. Oden PH, Shedler GS (1972) A model of memory contention in a paging machine. Commun ACM 15:761–771MathSciNetMATHCrossRefGoogle Scholar
  26. Odum E (1971) Fundamentals of ecology, 3rd edn. W. B. Saunders Co., PhiladelphiaGoogle Scholar
  27. Oliver NA (1974) Experimental data on page replacement algorithm. In: Proceedings of the national computer conference and exposition, AFIPS ’74, ACM, New York, pp 179–184Google Scholar
  28. Petoumenos P, Keramidas G, Zeffer H, Kaxiras S, Hagersten E (2006) Modeling cache sharing on chip multiprocessor architectures. In: IEEE International Symposium on workload characterization, 2006, pp 160–171Google Scholar
  29. Qureshi MK, Patt YN. (2006) Utility-based cache partitioning: a low-overhead, high-performance, runtime mechanism to partition shared caches. In: Proceedings of the 39th annual IEEE/ACM international symposium on microarchitecture, MICRO 39. IEEE Computer Society, Washington, pp 423–432Google Scholar
  30. Sainil S, Bailey DH (1996) NAS parallel benchmark (version 1.0) results 11-96, November 1996Google Scholar
  31. Shi X, Su F, Peir J-K, Xia Y, Yang Z (2009) Modeling and stack simulation of CMP cache capacity and accessibility. IEEE Trans Parallel Distrib Syst 20:1752–1763CrossRefGoogle Scholar
  32. Smith AJ (1981) Internal scheduling and memory contention. IEEE Trans Softw Eng SE-7(1):135–146CrossRefGoogle Scholar
  33. Song F, Moore S, Dongarra J (2007) L2 cache modeling for scientific applications on chip multi-processors. In: International conference on parallel processing, 2007. ICPP 2007, p 51Google Scholar
  34. Suh GE, Devadas S, Rudolph L (2001) Analytical cache models with applications to cache partitioning. In: Proceedings of the 15th international conference on supercomputing, ICS’01. ACM, New York, pp 1–12Google Scholar
  35. Tam D, Azimi R, Stumm M (2007) Thread clustering: sharing-aware scheduling on SMP-CMP-SMT multiprocessors. In: Proceedings of the 2nd ACM SIGOPS/EuroSys European conference on computer systems 2007, EuroSys ’07. ACM, New York, pp 47–58Google Scholar
  36. Tam DK, Azimi R, Soares LB, Stumm M (2009) RapidMRC: approximating L2 miss rate curves on commodity systems for online optimizations. SIGPLAN Not 44:121–132CrossRefGoogle Scholar
  37. Xue J, Vera X (2004) Efficient and accurate analytical modeling of whole-program data cache behavior. IEEE Trans Comput 53(5):547–566CrossRefGoogle Scholar
  38. Zhang X, Dwarkadas S, Shen K (2009) Towards practical page coloring-based multicore cache management. In: Proceedings of the 4th ACM European conference on computer systems, EuroSys ’09. ACM, New York, pp 89–102Google Scholar
  39. Zhang EZ, Jiang Y, Shen X (2010) Does cache sharing on modern CMP matter to the performance of contemporary multithreaded programs? In: Proceedings of the 15th ACM SIGPLAN symposium on principles and practice of parallel programming, PPoPP’10. ACM, New York, pp 203–212Google Scholar
  40. Zhuravlev S, Blagodurov S, Fedorova A (2010) Addressing shared resource contention in multicore processors via scheduling. SIGPLAN Not 45:129–142CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media Dordrecht 2012

Authors and Affiliations

  1. 1.Georgia Institute of TechnologyAtlantaUSA
  2. 2.Clemson UniversityClemsonUSA

Personalised recommendations