Discovery of Locality-Improving Refactorings by Reuse Path Analysis
Due to the huge speed gaps in the memory hierarchy of modern computer architectures, it is important that programs maintain a good data locality. Improving temporal locality implies reducing the distance of data reuses that are far apart. The best existing tools indicate locality bottlenecks by highlighting both the source locations generating the use and the subsequent cache-missing reuse. Even with this knowledge of the bottleneck locations in the source code, it often remains hard to find an effective code refactoring that improves temporal locality, due to the unclear interaction of function calls and loop iterations occurring between use and reuse.
The contributions in this paper are two-fold. First, the locality analysis is enhanced to not only pinpoint the cache bottlenecks, but to also suggest code refactorings that may resolve them. The refactorings are found by analyzing the dynamic hierarchy of function calls and loops on the code path between reuses, called reuse paths. Secondly, reservoir sampling of the reuse paths results in a significant reduction of the execution time and memory requirements during profiling, enabling the analysis of realistic programs.
An interactive GUI, called SLO (Suggestions for Locality Optimizations), has been used to explore the most appropriate refactorings in a number of SPEC2000 programs. After refactoring, the execution time of the selected programs was halved, on the average.
KeywordsMemory Access Basic Block Function Call Nest Loop Loop Iteration
Unable to display preview. Download preview PDF.
- 1.Berg, E., Hagersten, E.: Fast data-locality profiling of native execution. In: SIGMETRICS, pp. 169–180 (2005)Google Scholar
- 3.Beyls, K., D’Hollander, E.H.: Intermediately executed code is the key to find refactorings that improve temporal data locality. In: Computing Frontiers, pp. 373–382 (2006)Google Scholar
- 4.Martonosi, M., Gupta, A., Anderson, T.: Effectiveness of trace sampling for performance debugging tools. In: ACM SIGMETRICS (1993)Google Scholar
- 6.Walpole, R., Myers, R.: Probability and Statistics for Engineers and Scientists. Prentice-Hall, Englewood Cliffs (1993)Google Scholar
- 9.Devos, H., Beyls, K., Christiaens, M., Campenhout, J.V., D’Hollander, E.H., Stroobandt, D.: Finding and applying loop transformations for generating optimized FPGA implementations (Transactions on HiPEAC) (submitted)Google Scholar
- 10.Buck, B.R., Hollingsworth, J.K.: Data centric cache measurement on the intel itanium 2 processor. In: Proceedings of SuperComputing (2004)Google Scholar
- 12.Zhang, C., Ding, C., Ogihara, M., Zhong, Y., Wu, Y.: A hierarchical model of data locality. In: POPL (2006)Google Scholar
- 13.Fang, C., Carr, S., Onder, S., Wang, Z.: Instruction based memory distance analysis and its application to optimization. In: Malyshkin, V.E. (ed.) PaCT 2005. LNCS, vol. 3606. Springer, Heidelberg (2005)Google Scholar
- 14.Marin, G., Mellor-Crummey, J.: Cross-architecture performance predictions for scientific applications using parameterized models. In: SIGMETRICS (2004)Google Scholar
- 15.VanderAa, T., Jayapala, M., Barat, F., Corporaal, H., Catthoor, F., Deconinck, G.: Instruction and data memory energy trade-off using a high-level model. In: ODES (2004)Google Scholar
- 16.Shen, X., Zhong, Y., Ding, C.: Locality phase prediction. In: ASPLOS-XI, pp. 165–176 (2004)Google Scholar