Improving Cache Utilization Using Acumem VPE

  • Erik HagerstenEmail author
  • Mats Nilsson
  • Magnus Vesterlund
Conference paper


The move to multicore offers a steep increase in compute power, while little is done to improve the performance of the memory system. Typically, current applications make poor use of the memory system and few developers have the insight to fix such problems. Furthermore, the introduction of shared memory system resources makes the picture even more complicated.

Acumem Virtual Performance Expert (VPE) automatically identifies wasteful memory access behavior in applications and suggests improvements. About 20 different types of performance issues related to multi-threaded execution and cache usage are identified and fixes are suggested at a level of detail allowing even novice programmers to perform performance optimization requiring performance experts today.

Among other things, Acumem’s technology suggests changes to make cache usage more efficient and to lower memory bandwidth requirements. Most of today’s applications use less than half the data brought into the cache. If the applications could be optimized to use memory efficiently, that would lower the cache miss frequency substantially. Other parts of the application would then also benefit from reduced cache pressure. Based on a small application fingerprint file collected from native execution on a system the application’s performance on any memory system can be analyzed and application improvements be suggested.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Hennessy, J.L., Patterson, D.A.: Computer Architecture – A Quantitative Approach. Morgan Kaufmann Publishers, San Francisco, USA (2007) Google Scholar
  2. 2.
    Berg, E., et al.: Fast Data-Locality Profiling of Native Execution. Proceedings of the International Conference on Measurement and Modeling of Computer Systems, Banff, Alberta, Canada (2005) Google Scholar
  3. 3.
    Hammond, L., et al.: A Single-Chip Multiprocessor. IEEE Computer 30(9): 79-85 (1997) Google Scholar
  4. 4.
    Fernandes, E.S.T., et al.: Instruction usage and the memory gap problem. In Proceedings of 14th Symposium on Computer Architecture and High Performance Computing 2002 Google Scholar
  5. 5.
    Karlsson, M., et al.: Conserving Memory Bandwidth in Chip Multiprocessors with Runahead Execution. In Proceedings of IPDPS 2007 Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Erik Hagersten
    • 1
    Email author
  • Mats Nilsson
    • 1
  • Magnus Vesterlund
    • 1
  1. 1.AcumemUppsalaSweden

Personalised recommendations