Discovery of Locality-Improving Refactorings by Reuse Path Analysis

  • Kristof Beyls
  • Erik H. D’Hollander
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4208)


Due to the huge speed gaps in the memory hierarchy of modern computer architectures, it is important that programs maintain a good data locality. Improving temporal locality implies reducing the distance of data reuses that are far apart. The best existing tools indicate locality bottlenecks by highlighting both the source locations generating the use and the subsequent cache-missing reuse. Even with this knowledge of the bottleneck locations in the source code, it often remains hard to find an effective code refactoring that improves temporal locality, due to the unclear interaction of function calls and loop iterations occurring between use and reuse.

The contributions in this paper are two-fold. First, the locality analysis is enhanced to not only pinpoint the cache bottlenecks, but to also suggest code refactorings that may resolve them. The refactorings are found by analyzing the dynamic hierarchy of function calls and loops on the code path between reuses, called reuse paths. Secondly, reservoir sampling of the reuse paths results in a significant reduction of the execution time and memory requirements during profiling, enabling the analysis of realistic programs.

An interactive GUI, called SLO (Suggestions for Locality Optimizations), has been used to explore the most appropriate refactorings in a number of SPEC2000 programs. After refactoring, the execution time of the selected programs was halved, on the average.


Memory Access Basic Block Function Call Nest Loop Loop Iteration 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Berg, E., Hagersten, E.: Fast data-locality profiling of native execution. In: SIGMETRICS, pp. 169–180 (2005)Google Scholar
  2. 2.
    Beyls, K., D’Hollander, E.H., Vandeputte, F.: RDVIS: A tool that visualizes the causes of low locality and hints program optimizations. In: Sunderam, V.S., van Albada, G.D., Sloot, P.M.A., Dongarra, J. (eds.) ICCS 2005. LNCS, vol. 3515, pp. 166–173. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  3. 3.
    Beyls, K., D’Hollander, E.H.: Intermediately executed code is the key to find refactorings that improve temporal data locality. In: Computing Frontiers, pp. 373–382 (2006)Google Scholar
  4. 4.
    Martonosi, M., Gupta, A., Anderson, T.: Effectiveness of trace sampling for performance debugging tools. In: ACM SIGMETRICS (1993)Google Scholar
  5. 5.
    Uhlig, R.A., Mudge, T.N.: Trace-driven memory simulation: a survey. ACM Comput. Surv. 29(2), 128–170 (1997)CrossRefGoogle Scholar
  6. 6.
    Walpole, R., Myers, R.: Probability and Statistics for Engineers and Scientists. Prentice-Hall, Englewood Cliffs (1993)Google Scholar
  7. 7.
    Li, K.-H.: Reservoir-sampling algorithms of time complexity o(n(1 + log(n/n))). ACM Trans. Math. Softw. 20(4), 481–493 (1994)MATHCrossRefGoogle Scholar
  8. 8.
    Fang, C., Carr, S., Önder, S., Wang, Z.: Path-based reuse distance analysis. In: Mycroft, A., Zeller, A. (eds.) CC 2006 and ETAPS 2006. LNCS, vol. 3923, pp. 32–46. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  9. 9.
    Devos, H., Beyls, K., Christiaens, M., Campenhout, J.V., D’Hollander, E.H., Stroobandt, D.: Finding and applying loop transformations for generating optimized FPGA implementations (Transactions on HiPEAC) (submitted)Google Scholar
  10. 10.
    Buck, B.R., Hollingsworth, J.K.: Data centric cache measurement on the intel itanium 2 processor. In: Proceedings of SuperComputing (2004)Google Scholar
  11. 11.
    Beyls, K., D’Hollander, E.H.: Generating cache hints for improved program efficiency. J. of Systems Architecture 51(4), 223–250 (2005)CrossRefGoogle Scholar
  12. 12.
    Zhang, C., Ding, C., Ogihara, M., Zhong, Y., Wu, Y.: A hierarchical model of data locality. In: POPL (2006)Google Scholar
  13. 13.
    Fang, C., Carr, S., Onder, S., Wang, Z.: Instruction based memory distance analysis and its application to optimization. In: Malyshkin, V.E. (ed.) PaCT 2005. LNCS, vol. 3606. Springer, Heidelberg (2005)Google Scholar
  14. 14.
    Marin, G., Mellor-Crummey, J.: Cross-architecture performance predictions for scientific applications using parameterized models. In: SIGMETRICS (2004)Google Scholar
  15. 15.
    VanderAa, T., Jayapala, M., Barat, F., Corporaal, H., Catthoor, F., Deconinck, G.: Instruction and data memory energy trade-off using a high-level model. In: ODES (2004)Google Scholar
  16. 16.
    Shen, X., Zhong, Y., Ding, C.: Locality phase prediction. In: ASPLOS-XI, pp. 165–176 (2004)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Kristof Beyls
    • 1
  • Erik H. D’Hollander
    • 1
  1. 1.Department of Electronics and Information Systems (ELIS)Ghent UniversityGhentBelgium

Personalised recommendations