Analysis of the Spatial and Temporal Locality in Data Accesses

  • Jie Tao
  • Siegfried Schloissnig
  • Wolfgang Karl
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3992)


Cache optimization becomes increasingly important for achieving high computing performance, especially on current and future chip-multiprocessor (CMP) systems, which usually show a rather higher cache miss ratio than uni-processors. For such optimization, information about the access locality is needed in order to help the user in the tasks of data allocation, data transformation, and code transformation which are often used to enhance the utilization of cached data towards a better cache hit rate.

In this paper we demonstrate an analysis tool capable of detecting the spatial and temporal relationship between memory accesses and providing information, such as access pattern and access stride, which is required for applying some optimization techniques like address grouping, software prefetching, and code transformation. Based on the memory access trace generated by a code instrumentor, the analysis tool uses appropriate algorithms to detect repeated address sequences and the constant distance between accesses to the different elements of a data structure. This allows the users to pack data with spatial locality in the same cache block so that needed data can be loaded into the cache at the same time. In addition, the analysis tool computes the push back distance which shows how a cache miss can be avoided by reusing the data before replacement. This helps to reduce cache misses increasing therefore the temporal reusability of the working set.


Position Holder Code Transformation Cache Block Small Pattern Reuse Distance 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Bacon, D.F., Chow, J.-H., Ju, D.-c.R., Muthukumar, K., Sarkar, V.: A Compiler Framework for Restructuring Data Declarations to Enhance Cache and TLB Effectiveness. In: Proceedings of CASCON 1994 – Integrated Solutions, October 1994, pp. 270–282 (1994)Google Scholar
  2. 2.
    Chattaraj, A., Parida, L.: An Inexact-suffix-tree-based Algorithm for Detecting Extensible Patterns. Theoretical Computer Science 335(1), 3–14 (2005)MathSciNetMATHCrossRefGoogle Scholar
  3. 3.
    Delcher, A., Kasil, S., Fleischmann, R., White, O., Peterson, J., Salzberg, S.: Alignment of Whole Genomes. Nucelic Acids Research 27(11), 2369–2376 (1999)CrossRefGoogle Scholar
  4. 4.
    Ding, C., Kennedy, K.: Improving Cache Performance in Dynamic Applications through Data and Computation Reorganization at Run Time. ACM SIGPLAN Notices 34(5), 229–241 (1999)CrossRefGoogle Scholar
  5. 5.
    Douglas, C.C., Hu, J., Kowarschik, M., Rüde, U., Weiss, C.: Cache Optimization for Structured and Unstructured Grid Multigrid. Electronic Transaction on Numerical Analysis 10, 21–40 (2000)MATHGoogle Scholar
  6. 6.
    Mohan, T., et al.: Identifying and Exploiting Spatial Regularity in Data Memory References. In: Supercomputing 2003 (November 2003)Google Scholar
  7. 7.
    Fung, S.: Improving Cache Locality for Thread-Level Speculation. Master’s thesis, University of Toronto (2005)Google Scholar
  8. 8.
    Ghosh, S., Martonosi, M., Malik, S.: Precise Miss Analysis for Program Transformations with Caches of Arbitrary Associativity. ACM SIGPLAN Notices 33(11), 228–239 (1998)CrossRefGoogle Scholar
  9. 9.
    Ghosh, S., Martonosi, M., Malik, S.: Automated Cache Optimizations using CME Driven Diagnosis. In: Proceedings of the 2000 International Conference on Supercomputing, May 2000, pp. 316–326 (2000)Google Scholar
  10. 10.
    Megiddo, N., Sarkar, V.: Optimal Weighted Loop Fusion for Parallel Programs. In: Proceedings of the 9th Annual ACM Symposium on Parallel Algorithms and Architectures, New York, June 1997, pp. 282–291 (1997)Google Scholar
  11. 11.
    Nguyen, A.-T., Michael, M., Sharma, A., Torrellas, J.: The augmint multiprocessor simulation toolkit for intel x86 architectures. In: Proceedings of 1996 International Conference on Computer Design (October 1996)Google Scholar
  12. 12.
    Park, J., Penner, M., Prasanna, V.: Optimizing Graph Algorithms for Improved Cache Performance. In: Proceedings of the 16th International Parallel and Distributed Processing Symposium, April 2002, pp. 32–33 (2002)Google Scholar
  13. 13.
    Rigoutsos, I., Floratos, A.: Combinatorial Pattern Discovery in Biological Sequences: the TEIRESIAS Algorithm. Bioinformatics 14(1), 55–67 (1998)CrossRefGoogle Scholar
  14. 14.
    Rivera, G., Tseng, C.W.: Data Transformations for Eliminating Conflict Misses. In: Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, Montreal, Canada, June 1998, pp. 38–49 (1998)Google Scholar
  15. 15.
    Woo, S.C., Ohara, M., Torrie, E., Singh, J.P., Gupta, A.: The SPLASH-2 Programs: Characterization and Methodological Considerations. In: Proceedings of the 22nd Annual International Symposium on Computer Architecture, June 1995, pp. 24–36 (1995)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Jie Tao
    • 1
  • Siegfried Schloissnig
    • 1
  • Wolfgang Karl
    • 1
  1. 1.Institut für Technische InformatikUniversität Karlsruhe (TH)KarlsruheGermany

Personalised recommendations