Analytical Bounds for Optimal Tile Size Selection

  • Jun Shirako
  • Kamal Sharma
  • Naznin Fauzia
  • Louis-Noël Pouchet
  • J. Ramanujam
  • P. Sadayappan
  • Vivek Sarkar
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7210)

Abstract

In this paper, we introduce a novel approach to guide tile size selection by employing analytical models to limit empirical search within a subspace of the full search space. Two analytical models are used together: 1) an existing conservative model, based on the data footprint of a tile, which ignores intra-tile cache block replacement, and 2) an aggressive new model that assumes optimal cache block replacement within a tile. Experimental results on multiple platforms demonstrate the practical effectiveness of the approach by reducing the search space for the optimal tile size by 1,307× to 11,879× for an Intel Core-2-Quad system; 358× to 1,978× for an Intel Nehalem system; and 45× to 1,142× for an IBM Power7 system. The execution of rectangularly tiled code tuned by a search of the subspace identified by our model achieves speed-ups of up to 1.40× (Intel Core-2 Quad), 1.28× (Nehalem) and 1.19× (Power 7) relative to the best possible square tile sizes on these different processor architectures. We also demonstrate the integration of the analytical bounds with existing search optimization algorithms. Our approach not only reduces the total search time from Nelder-Mead Simplex and Parallel Rank Ordering methods by factors of up to 4.95× and 4.33×, respectively, but also finds better tile sizes that yield higher performance in tuned tiled code.

Keywords

Search Space Cache Line Memory Hierarchy Analytical Bound Tile Size 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. 1.
    Barr, T.W., Cox, A.L., Rixner, S.: Translation caching: skip, don’t walk (the page table). In: ISCA 2010, pp. 48–59. ACM, New York (2010)Google Scholar
  2. 2.
    Baskaran, M., Hartono, A., Tavarageri, S., Henretty, T., Ramanujam, J., Sadayappan, P.: Parameterized tiling revisited. In: CGO, pp. 200–209 (2010)Google Scholar
  3. 3.
    Bhargava, R., Serebrin, B., Spadini, F., Manne, S.: Accelerating two-dimensional page walks for virtualized systems. In: ASPLOS XIII, pp. 26–35 (2008)Google Scholar
  4. 4.
    Bilmes, J., Asanovic, K., Chin, C., Demmel, J.: Optimizing matrix multiply using PHiPAC. In: Proc. ICS, pp. 340–347 (1997)Google Scholar
  5. 5.
    Bodin, F., Jalby, W., Windheiser, D., Eisenbeis, C.: A quantitative algorithm for data locality optimization. In: Code Generation, pp. 119–145 (1991)Google Scholar
  6. 6.
    Bondhugula, U., Hartono, A., Ramanujam, J., Sadayappan, P.: A practical automatic polyhedral program optimization system. In: PLDI (2008)Google Scholar
  7. 7.
    Boulet, P., Darte, A., Risset, T., Robert, Y. (Pen)-ultimate tiling? Integration, the VLSI Journal 17(1), 33–51 (1994)CrossRefGoogle Scholar
  8. 8.
    Chame, J., Moon, S.: A tile selection algorithm for data locality and cache interference. In: ICS, pp. 492–499 (1999)Google Scholar
  9. 9.
    Chen, C., Chame, J., Hall, M.: Combining models and guided empirical search to optimize for multiple levels of the memory hierarchy. In: CGO 2005 (2005)Google Scholar
  10. 10.
    Coleman, S., McKinley, K.: Tile Size Selection Using Cache Organization and Data Layout. In: PLDI, pp. 279–290 (1995)Google Scholar
  11. 11.
    Datta, K.: Auto-tuning stencil codes for cache-based multicore platforms. Technical report, University of California, Berkeley (December 2009)Google Scholar
  12. 12.
    Ferrante, J., Sarkar, V., Thrash, W.: On Estimating and Enhancing Cache Effectiveness. In: Banerjee, U., Nicolau, A., Gelernter, D., Padua, D.A. (eds.) LCPC 1991. LNCS, vol. 589, pp. 328–343. Springer, Heidelberg (1992)CrossRefGoogle Scholar
  13. 13.
    Ghosh, S., Martonosi, M., Malik, S.: Cache miss equations: a compiler framework for analyzing and tuning memory behavior. ACM TOPLAS 21(4), 703–746 (1999)CrossRefGoogle Scholar
  14. 14.
    Goto, K., van de Geijn, R.A.: High-performance implementation of the level-3 BLAS. ACM Trans. Math. Softw. 35(1) (July 2008)Google Scholar
  15. 15.
    Hartono, A., Baskaran, M.M., Bastoul, C., Cohen, A., Krishnamoorthy, S., Norris, B., Ramanujam, J., Sadayappan, P.: Parametric multi-level tiling of imperfectly nested loops. In: Proc. ICS (2009)Google Scholar
  16. 16.
    Hsu, C., Kremer, U.: A quantitative analysis of tile size selection algorithms. J. Supercomput. 27(3), 279–294 (2004)MATHCrossRefGoogle Scholar
  17. 17.
    Irigoin, F., Triolet, R.: Supernode partitioning. In: ACM POPL, pp. 319–329 (1988)Google Scholar
  18. 18.
    Kim, D., Renganarayanan, L., Strout, M., Rajopadhye, S.: Multi-level tiling: ’m’ for the price of one. In: SC (2007)Google Scholar
  19. 19.
    Knijnenburg, P.M.W., Kisuki, T., O’Boyle, M.F.P.: Combined selection of tile sizes and unroll factors using iterative compilation. The Journal of Supercomputing 24(1), 43–67 (2003)MATHCrossRefGoogle Scholar
  20. 20.
    Lam, M., Rothberg, E., Wolf, M.: The cache performance and optimizations of blocked algorithms. In: Proc. 4th ACM ASPLOS, pp. 63–74 (1991)Google Scholar
  21. 21.
    Luersen, M., Riche, R.L., Guyon, F.: A constrained, globalized, and bounded nelder-mead method for engineering optimization. Structural and Multidisciplinary Optimization 27(1-2), 43–54 (2004)CrossRefGoogle Scholar
  22. 22.
    Nelder, J.A., Mead, R.: A simplex method for function minimization. Computer Journal 7(4), 308–313 (1965)MATHGoogle Scholar
  23. 23.
    Ramanujam, J., Sadayappan, P.: Tiling multidimensional iteration spaces for multicomputers. JPDC 16(2), 108–230 (1992)Google Scholar
  24. 24.
    Renganarayana, L., Kim, D., Rajopadhye, S., Strout, M.: Parameterized tiled loops for free. In: PLDI, pp. 405–414 (2007)Google Scholar
  25. 25.
    Resource Characterization in the PACE Project, http://www.pace.rice.edu/Content.aspx?id=41
  26. 26.
    Rivera, G., Tseng, C.: Locality optimizations for multi-level caches. In: SC (1999)Google Scholar
  27. 27.
    Sarkar, V.: Automatic Selection of High Order Transformations in the IBM XL Fortran Compilers. IBM J. Res. & Dev. 41(3) (May 1997)Google Scholar
  28. 28.
    Sarkar, V., Megiddo, N.: An analytical model for loop tiling and its solution. In: IEEE ISPASS (2000)Google Scholar
  29. 29.
    Schreiber, R., Dongarra, J.: Automatic blocking of nested loops. Tech. Report 90.38, RIACS, NASA Ames Research Center (1990)Google Scholar
  30. 30.
    Tabatabaee, V., Tiwari, A., Hollingsworth, J.K.: Parallel parameter tuning for applications with performance variability. In: Proc. Supercomputing 2005 (2005)Google Scholar
  31. 31.
    Tapus, C., Chung, I.-H., Hollingsworth, J.K.: Active harmony: towards automated performance tuning. In: SC, pp. 1–11 (2002)Google Scholar
  32. 32.
    Tiwari, A., Chen, C., Chame, J., Hall, M., Hollingsworth, J.: Scalable autotuning framework for compiler optimization. In: IPDPS 2009 (2009)Google Scholar
  33. 33.
    Whaley, R.C., Petitet, A., Dongarra, J.J.: Automated empirical optimization of software and the ATLAS project. Parallel Computing 27(1–2), 3–35 (2001)MATHCrossRefGoogle Scholar
  34. 34.
    Wolf, M., Lam, M.S.: A data locality optimizing algorithm. In: PLDI 1991, pp. 30–44 (1991)Google Scholar
  35. 35.
    Wolfe, M.: More iteration space tiling. In: Proc. Supercomputing, pp. 655–664 (1989)Google Scholar
  36. 36.
    Xue, J.: Loop tiling for parallelism. Kluwer Academic Publishers, Norwell (2000)MATHCrossRefGoogle Scholar
  37. 37.
    Yotov, K., Pingali, K., Stodghill, P.: Think globally, search locally. In: International Conference on Supercomputing (2005)Google Scholar
  38. 38.
    Yuki, T., Renganarayanan, L., Rajopadhye, S., Anderson, C., Eichenberger, A., O’Brien, K.: Automatic creation of tile size selection models. In: CGO, pp. 190–199 (2010)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Jun Shirako
    • 1
  • Kamal Sharma
    • 1
  • Naznin Fauzia
    • 2
  • Louis-Noël Pouchet
    • 2
  • J. Ramanujam
    • 3
  • P. Sadayappan
    • 2
  • Vivek Sarkar
    • 1
  1. 1.Rice UniversityUSA
  2. 2.The Ohio State UniversityUSA
  3. 3.Louisiana State UniversityUSA

Personalised recommendations