Using Padding to Optimize Locality in Scientific Applications

  • E. Herruzo
  • O. Plata
  • E. L. Zapata
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5101)


Program locality exploitation is a key issue to reduce the execution time of scientific applications, so as many techniques have been designed for locality optimization. This paper presents new compiler algorithms based on array padding that optimize program locality either locally (at loop level) or globally (the whole program). We first introduce a formal cache model that is used to analyze how all cache levels are filled up when arrays inside nested loops are referenced. We further study the relation between the model parameters and the data memory layout of the arrays, and define how to pad those arrays in order to optimize cache occupation at all levels. Experimental evaluation on some numerical benchmarks shows the benefits of our approach.


Loop Nest Array Reference Cache Block Cache Level Innermost Loop 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Bacon, D.F., Graham, S.L., Sharp, O.J.: Compiler Transformations for High-Performance Computing. ACM Computing Surveys 26(4), 345–420 (1994)CrossRefGoogle Scholar
  2. 2.
    Clauss, P., Meister, B.: Automatic Memory Layout Transformation to Optimize Spatial Locality in Parameterized Loop Nests. ACM Computer Architecture News 28(1), 11–19 (2000)CrossRefGoogle Scholar
  3. 3.
    Coleman, S., Mckinley, K.S.: Tile Size Selection Using Cache Organization and Data Layout. In: ACM Conf. on PLDI. La Jolla (CA), pp. 279–290 (1995)Google Scholar
  4. 4.
    Ferrante, J., Sarkar, V., Thrash, W.: On Estimating and Enhancing Cache Effectivenes. Work. on Languages and Compilers for Parallel Computers (1991)Google Scholar
  5. 5.
    Ghosh, S., Martonosi, M., Malik, S.: Cache Miss Equations: A Compiler Framework for Analyzing and Tunning Memory Behaviour. ACM TOPLAS 21(4), 703–746 (1999)CrossRefGoogle Scholar
  6. 6.
    Kandemir, M., Banerjee, P., Choudhary, A., Ramanujam, J., Ayguade, E.: An Integer Linear Programming Approach for Optimizing Cache Locality. In: ACM Int’l. Conf. on Supercomputing Rhodes, pp. 500–509 (1999)Google Scholar
  7. 7.
    Kandemir, M., Choudhary, A., Ramanujam, J., Banerjee, P.: Improving Locality Using Loop and Data Transformations in an Integrated Framework. In: ACM/IEEE Int’l. Symp. on Microarchitecture. Dallas (TX), pp. 285–297 (1998)Google Scholar
  8. 8.
    O’Boyle, M., Knijnenburg, P.: Integrating Loop and Data Transformations for Global Optimizations. In: IEEE Int’l. Conf. on Parallel Architectures and Compilation Techniques., Paris, pp. 12–19 (1998)Google Scholar
  9. 9.
    Panda, P., Nakamura, H., Dutt, N., Nicolau, A.: A Data Alignment Technique for Improving Cache Performance. In: Int’l. Conf. on Computer Design: VLSI in Computers and Processors., Austin (TX), pp. 587–592 (1997)Google Scholar
  10. 10.
    Rivera, G., Tseng, C.W.: Data Transformations for Eliminating Conflict Misses. In: ACM Conf. on PLDI, Montreal, pp. 38–49 (1998)Google Scholar
  11. 11.
    Vera, X., Abella, J., Llosa, J., González, A.: An Accurate Cost Model for Guiding Data Locality Transformations. ACM TOPLAS 27(5), 946–987 (2005)CrossRefGoogle Scholar
  12. 12.
    Li, Z., Song, Y.: Automatic Tiling of Iterative Stencil Loops. ACM TOPLAS 26(6), 975–1028 (2004)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • E. Herruzo
    • 1
  • O. Plata
    • 2
  • E. L. Zapata
    • 2
  1. 1.Dept. ElectronicsUniversity of CórdobaSpain
  2. 2.Dept. Computer ArchitectureUniversity of MálagaSpain

Personalised recommendations