Efficient Cache Modeling with Sparse Data

  • Erik Hagersten
  • David Eklöv
  • David Black-Schaffer


Obtaining good application performance requires tuning for effective use of the cache hierarchy. However, most tools to analyze cache usage either generate architecture-specific results (e.g., hardware performance counters) or incur prohibitively high overheads for real-world workloads (e.g., trace-based simulations). This chapter reviews several recently introduced techniques that address these issues to efficiently model cache systems and coherent memory hierarchies in an architecturally independent manner. The techniques utilize only sparse, architecturally independent runtime information that can be collected with an overhead of 10–30%. This information is then processed by statistical models to quickly predict cache behavior across a range of architectures. With these approaches, accurate modeling is possible from data sampled with ’ as low as 1 in 106 memory accesses.


Memory Reference Cache Size Cache Line Sample Window Target Architecture 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Berg, E., Hagersten, E.: StatCache: A probabilistic approach to efficient and accurate data locality analysis. In: Proceedings of the 2004 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS-2004), Austin, TX, USA, March (2004).Google Scholar
  2. 2.
    Berg, E., Hagersten, E.: Fast data-locality profiling of native execution. In: Proceedings of ACM SIGMETRICS 2005, Ban, Canada, June (2005).Google Scholar
  3. 3.
    Berg, E., Zeffer, H., Hagersten, E.: A statistical multiprocessor cache model. In: Proceedings of the 2006 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS-2006), Austin, TX, USA, March (2006).Google Scholar
  4. 4.
    Burger, D., Austin, T.M.: The SimpleScalar tool set, version 2.0. In: SIGARCH (1997).Google Scholar
  5. 5.
    Eklöv, D., Hagersten, E.: StatStack: Efficient modeling of LRU caches. In: Proceedings of the 2010 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS-2010), White Plains, NY, USA, March (2010).Google Scholar
  6. 6.
    Ferdinand, C., Wilhelm, R.: Efficient and precise cache behavior prediction for real-time systems. J Real-Time Syst 17(2–3), (1999).Google Scholar
  7. 7.
    Hagersten, E., Nilsson, M., Vesterlund, M.: Improving cache utilization using Acumem VPE in tools for high performance computing. In: Proceedings of the 2nd International Workshop on Parallel Tools for High Performance Computing, HLRS, Stuttgart, Springer, July (2008).Google Scholar
  8. 8.
    Luk, C. et al.: Pin: Building Customized Program Analysis Tools with Dynamic Instrumentation. ACM Press, New York, NY, USA, (2005).Google Scholar
  9. 9.
    Magnusson, P. et al.: Simics: A full system simulation platform. IEEE Comput Soc 35(2) 50–58, (2005).CrossRefGoogle Scholar
  10. 10.
    Mattson, R.L. et al.: Evaluation techniques for storage hierarchies. IBM Syst J 9(2), (1970).Google Scholar
  11. 11.
    Noordergraaf, L., Zak, R.: SMP system interconnect instrumentation for performance analysis. In: Proceedings of the 2002 ACM/IEEE Conference on Supercomputing, Baltimore, Maryland, USA (2002).Google Scholar
  12. 12.
    Intel, Intel VTuen Performance Analyzer,, June (2010).

Copyright information

© Springer Science+business Media, LLC 2010

Authors and Affiliations

  • Erik Hagersten
    • 1
  • David Eklöv
    • 2
  • David Black-Schaffer
    • 2
  1. 1.Acumem ABUppsalaSweden
  2. 2.University of UppsalaUppsalaSweden

Personalised recommendations