Collecting and Exploiting Cache-Reuse Metrics

  • Josef Weidendorfer
  • Carsten Trinitis
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3515)


The increasing gap of processor and main memory performance underlines the need for cache-optimizations, especially on memory-intensive applications. Tools which are able to localize code regions with high cache miss ratio seem to be appropriate for access optimizations. However, a programmer often does not know what to do with the collected information. We try to improve this situation by providing cache reuse metrics which are supposed to give more precise hints on how to optimize memory access behavior. We enhanced the cache simulator Callgrind to give metrics on temporal and spatial cache utilization for a given memory block, relating this information to the code line where the block was loaded into cache. We show what is needed for hardware-supported measurement for such metrics, and give example code where the collected information directly points to optimization possibilities.


Cache Reuse Metrics Profiling Cache Simulation 


  1. 1.
    Anderson, J.M., Berc, L.M., Dean, J., et al.: Continuous Profiling: Where Have All the Cycles Gone? ACM Transactions on Computer Systems 15(4), 357–390 (1997)CrossRefGoogle Scholar
  2. 2.
    Berg, E., Hagersten, E.: SIP: Performance Tuning through Source Code Interdependence. In: Monien, B., Feldmann, R.L. (eds.) Euro-Par 2002. LNCS, vol. 2400, pp. 177–186. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  3. 3.
    Beyls, K., D’Hollander, E.H.: Platform-Independent Cache Optimization by Pinpointing Low-Locality Reuse. In: Proceedings of International Conference on Computational Science, vol. 3, pp. 463–470 (2004)Google Scholar
  4. 4.
    Brandes, T.: Adaptor. Homepage,
  5. 5.
    DeRose, L., Ekanadham, K., Hollingsworth, J.K., Sbaraglia, S.: SIGMA: A Simulator Infrastructure to Guide Memory Analysis. In: Proceedings of SC 2002, Baltimore, MD (November 2002)Google Scholar
  6. 6.
    Intel Corporation. IA-32 Intel Architecture: Software Developers ManualGoogle Scholar
  7. 7.
    Intel Corporation. Intel VTune Performance Analyser, Available at
  8. 8.
    Kowarschik, M., Weiß, C.: An Overview of Cache Optimization Techniques and Cache-Aware Numerical Algorithms. In: Meyer, U., Sanders, P., Sibeyn, J.F. (eds.) Algorithms for Memory Hierarchies. LNCS, vol. 2625, pp. 213–232. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  9. 9.
    Levon, J.: OProfile, a system-wide profiler for Linux systems, Homepage
  10. 10.
    Martonosi, M., Gupta, A., Anderson, T.E.: Memspy: Analyzing memory system bottlenecks in programs. In: Measurement and Modeling of Computer Systems, pp. 1–12 (1992)Google Scholar
  11. 11.
    Mellor-Crummey, J., Fowler, R., Whalley, D.: Tools for Application-Oriented Performance Tuning. In: Proceedings of 15th ACM International Conference on Supercomputing, Italy (June 2001)Google Scholar
  12. 12.
    Nethercote, N., Mycroft, A.: The Cache Behaviour of Large Lazy Functional Programs on Stock Hardware. In: Proceedings of the ACM SIGPLAN Workshop on Memory System Performance (MSP 2002), Berlin, Germany (July 2002)Google Scholar
  13. 13.
    Nethercote, N., Seward, J.: Valgrind: A Program Supervision Framework. In: Proceedings of the Third Workshop on Runtime Verification (RV 2003), Boulder, Colorado, USA, (July 2003), Available at
  14. 14.
    Weidendorfer, J., Kowarschik, M., Trinitis, C.: A Tool Suite for Simulation Based Analysis of Memory Access Behavior. In: Proceedings of International Conference on Computational Science, vol. 3, pp. 455–462 (June 2004)Google Scholar
  15. 15.
    Wulf, W.A., McKee, S.A.: Hitting the Memory Wall: Implications of the Obvious. Computer Architecture News 23(1), 20–24 (1995)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • Josef Weidendorfer
    • 1
  • Carsten Trinitis
    • 1
  1. 1.Technische Universität MünchenGermany

Personalised recommendations