International Journal of Parallel Programming

, Volume 31, Issue 4, pp 305–338

Restructuring Computations for Temporal Data Cache Locality

Authors

  • Venkata K. Pingali
    • Information Sciences InstituteUniversity of Southern California
  • Sally A. McKee
    • Electrical and Computer EngineeringCornell University
  • Wilson C. Hsieh
    • School of ComputingUniversity of Utah
  • John B. Carter
    • School of ComputingUniversity of Utah
Article

DOI: 10.1023/A:1024556711058

Cite this article as:
Pingali, V.K., McKee, S.A., Hsieh, W.C. et al. International Journal of Parallel Programming (2003) 31: 305. doi:10.1023/A:1024556711058

Abstract

Data access costs contribute significantly to the execution time of applications with complex data structures. A the latency of memory accesses becomes high relative to processor cycle times, application performance is increasingly limited by memory performance. In some situations it is useful to trade increased computation costs for reduced memory costs. The contributions of this paper are three-fold: we provide a detailed analysis of the memory performance of seven memory-intensive benchmarks; we describe Computation Regrouping, a source-level approach to improving the performance of memory-bound applications by increasing temporal locality to eliminate cache and TLB misses; and, we demonstrate significant performance improvement by applying Computation Regrouping to our suite of seven benchmarks. Using Computation Regrouping, we observe a geometric mean speedup of 1.90, with individual speedups ranging from 1.26 to 3.03. Most of this improvement comes from eliminating memory tall time.

Memory performancedata structuresoptimization

Copyright information

© Plenum Publishing Corporation 2003