Analysis of the Spatial and Temporal Locality in Data Accesses
Cache optimization becomes increasingly important for achieving high computing performance, especially on current and future chip-multiprocessor (CMP) systems, which usually show a rather higher cache miss ratio than uni-processors. For such optimization, information about the access locality is needed in order to help the user in the tasks of data allocation, data transformation, and code transformation which are often used to enhance the utilization of cached data towards a better cache hit rate.
In this paper we demonstrate an analysis tool capable of detecting the spatial and temporal relationship between memory accesses and providing information, such as access pattern and access stride, which is required for applying some optimization techniques like address grouping, software prefetching, and code transformation. Based on the memory access trace generated by a code instrumentor, the analysis tool uses appropriate algorithms to detect repeated address sequences and the constant distance between accesses to the different elements of a data structure. This allows the users to pack data with spatial locality in the same cache block so that needed data can be loaded into the cache at the same time. In addition, the analysis tool computes the push back distance which shows how a cache miss can be avoided by reusing the data before replacement. This helps to reduce cache misses increasing therefore the temporal reusability of the working set.
KeywordsPosition Holder Code Transformation Cache Block Small Pattern Reuse Distance
- 1.Bacon, D.F., Chow, J.-H., Ju, D.-c.R., Muthukumar, K., Sarkar, V.: A Compiler Framework for Restructuring Data Declarations to Enhance Cache and TLB Effectiveness. In: Proceedings of CASCON 1994 – Integrated Solutions, October 1994, pp. 270–282 (1994)Google Scholar
- 6.Mohan, T., et al.: Identifying and Exploiting Spatial Regularity in Data Memory References. In: Supercomputing 2003 (November 2003)Google Scholar
- 7.Fung, S.: Improving Cache Locality for Thread-Level Speculation. Master’s thesis, University of Toronto (2005)Google Scholar
- 9.Ghosh, S., Martonosi, M., Malik, S.: Automated Cache Optimizations using CME Driven Diagnosis. In: Proceedings of the 2000 International Conference on Supercomputing, May 2000, pp. 316–326 (2000)Google Scholar
- 10.Megiddo, N., Sarkar, V.: Optimal Weighted Loop Fusion for Parallel Programs. In: Proceedings of the 9th Annual ACM Symposium on Parallel Algorithms and Architectures, New York, June 1997, pp. 282–291 (1997)Google Scholar
- 11.Nguyen, A.-T., Michael, M., Sharma, A., Torrellas, J.: The augmint multiprocessor simulation toolkit for intel x86 architectures. In: Proceedings of 1996 International Conference on Computer Design (October 1996)Google Scholar
- 12.Park, J., Penner, M., Prasanna, V.: Optimizing Graph Algorithms for Improved Cache Performance. In: Proceedings of the 16th International Parallel and Distributed Processing Symposium, April 2002, pp. 32–33 (2002)Google Scholar
- 14.Rivera, G., Tseng, C.W.: Data Transformations for Eliminating Conflict Misses. In: Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, Montreal, Canada, June 1998, pp. 38–49 (1998)Google Scholar
- 15.Woo, S.C., Ohara, M., Torrie, E., Singh, J.P., Gupta, A.: The SPLASH-2 Programs: Characterization and Methodological Considerations. In: Proceedings of the 22nd Annual International Symposium on Computer Architecture, June 1995, pp. 24–36 (1995)Google Scholar