Encyclopedia of Parallel Computing

2011 Edition
| Editors: David Padua


  • François Irigoin
Reference work entry
DOI: https://doi.org/10.1007/978-0-387-09766-4_511



Tiling is a program transformation used to improve the spatial and/or temporal memory locality of a loop nest by changing its iteration order, and/or to reduce its synchronization or communication overhead by controlling the granularity of its parallel execution. Tiling adds some control overhead because the number of loops is doubled, and reduces the amount of parallelism available in the outermost loops. The n initial loops are replaced by n outer loops used to enumerate the tiles and n inner loops used to execute all the iterations within a tile.



Tiling is useful for most recent parallel computer architectures, with shared or distributed memory, since they all rely on locality to exploit their memory hierarchies and on parallelism to exploit several cores. It is also useful for heterogeneous architectures with hardware accelerators, and for...

This is a preview of subscription content, log in to check access.


  1. 1.
    Agarwal A, Kranz D, Natarajan V (1993) Automatic partitioning of parallel loops for cache-coherent multiprocessors. In: International conference on parallel processing (ICPP), Syracuse University, Syracuse, NY, 16–20 August 1993, vol 1, pp 2–11Google Scholar
  2. 2.
    Agarwal A, Kranz DA, Natarajan V (September 1995) Automatic partitioning of parallel loops and data arrays for distributed shared-memory multiprocessors. IEEE Trans Parallel Distrib Syst 6(9):943–962Google Scholar
  3. 3.
    Ahmed N, Mateev N, Pingali K (2000) Synthesizing transformations for locality enhancement of imperfectly-nested loop nests. In: Proceedings of the 14th international conference on supercomputing, Santa Fe, 8–11 May 2000, pp 141–152Google Scholar
  4. 4.
    Allen R, Kennedy K (2002) Optimizing compilers for modern architectures: a dependence-based approach. Morgan-Kaufmann. San Francisco, pp 477–491Google Scholar
  5. 5.
    Ancourt C, Irigoin F (1991) Scanning polyhedra with DO loops. In: Third ACM symposium on principles and practice of parallel programming, Williamsburg, VA, pp 39–50Google Scholar
  6. 6.
    Andonov R, Balev S, Rajopadhye S, Yanev N (July 2001) Optimal semi-oblique tiling. In: Proceedings of the 13th annual ACM symposium on parallel algorithms and architectures, Crete Island, pp 153–162Google Scholar
  7. 7.
    Andonov R, Rajopadhye SV, Yanev N (1998) Optimal orthogonal tiling. In: Proceedings of the fourth international Euro-Par conference on parallel processing, Southampton, 1–4 Sept 1998, pp 480–490Google Scholar
  8. 8.
    Bondhugula U, Baskaran M, Krishnamoorthy S, Ramanujam J, Rountev A, Sadayappan P (2008) Automatic transformations for communication-minimized parallelization and locality optimization in the polyhedral model. In: Proceedings of the joint European conferences on theory and practice of software 17th international conference on compiler construction, Budapest, Hungary, 29 March–6 April 2008Google Scholar
  9. 9.
    Bondhugula U, Hartono A, Ramanujam J, Sadayappan P (June 2008) A practical automatic polyhedral parallelizer and locality optimizer. In: PLDI 2008. ACM SIGPLAN Not 43(6)Google Scholar
  10. 10.
    Boulet P, Darte A, Risset T, Robert Y (1996) (Pen)-ultimate tiling? Integr: VLSI J 17:33–51Google Scholar
  11. 11.
    Carter L, Ferrante J, Hummel SF (1995) Hierarchical tiling for improved superscalar performance. In: Proceedings of the ninth international symposium on parallel processing, Santa Barbara, 25–28 April 1995, pp 239–245Google Scholar
  12. 12.
    Coleman S, McKinley KS (June 1995) Tile size selection using cache organization and data layout. In: PLDI’95; ACM SIGPLAN Not 30(6):279–290Google Scholar
  13. 13.
    Goumas G, Athanasaki M, Koziris N (2002) Automatic code generation for executing tiled nested loops onto parallel architectures. In: Proceedings of the 2002 ACM symposium on applied computing, Madrid, Spain, 11–14 March 2002Google Scholar
  14. 14.
    Goumas G, Drosinos N, Athanasaki M, Koziris N (November 2006) Message-passing code generation for non-rectangular tiling transformations. Parallel Computing 32(10): 711–732MathSciNetGoogle Scholar
  15. 15.
    Griebl M (July 2001) On tiling space-time mapped loop nests. In: Proceedings of the 13th annual ACM symposium on parallel algorithms and architectures, Crete Island, pp 322–323Google Scholar
  16. 16.
    Griebl M (June 2004) Automatic parallelization of loop programs for distributed memory architectures. Habilitation thesis, Department of Informatics and Mathematics, University of Passau. http://www.fim.uni-passau.de/cl/publications/docs/Gri04.pdf
  17. 17.
    Guo J, Bikshandi G, Fraguela BB, Garzaran MJ, Padua D (2008) Programming with tiles. In: Proceedings of the 13th ACM SIGPLAN symposium on principles and practice of parallel programming, Salt Lake City, UT, USA, 20–23 Feb 2008Google Scholar
  18. 18.
    Hartono A, Manikandan Baskaran M, Bastoul C, Cohen A, Krishnamoorthy S, Norris B, Ramanujam J, Sadayappan P (2009) Parametric multi-level tiling of imperfectly nested loops. In: Proceedings of the 23rd international conference on supercomputing, Yorktown Heights, NY, USA, 8–12 June 2009Google Scholar
  19. 19.
    Hodzic E, Shang W (December 2002) On time optimal supernode shape. IEEE Trans Parallel Distrib Syst 13(12):1220–1233Google Scholar
  20. 20.
    Högstedt K, Carter L, Ferrante J (March 2003) On the parallel execution time of tiled loops. IEEE Trans Parallel Distrib Syst 14(3):307–321Google Scholar
  21. 21.
    Irigoin F, Triolet R (1988) Supernode partitioning. In: Fifteenth annual ACM symposium on principles of programming languages, San Diego, CA, pp 319–329Google Scholar
  22. 22.
    Jiménez M, Llabería JM, Fernández A (July 2002) Register tiling in nonrectangular iteration spaces. ACM Trans Program Lang Syst 24(4):409–453Google Scholar
  23. 23.
    Manikandan Baskaran M, Hartono A, Tavarageri S, Henretty T, Ramanujam J, Sadayappan P (2010) Parameterized tiling revisited. In: CGO’10: proceedings of the eighth annual IEEE/ACM international symposium on code generation and optimization, pp 200–209Google Scholar
  24. 24.
    McKeller AC, Coffman EG (1969) The organization of matrices and matrix operations in a paged multiprogramming environment. Commun ACM 12(3):153–165Google Scholar
  25. 25.
    Rastello F, Rao A, Pande S (February 2003) Optimal task scheduling at run time to exploit intra-tile parallelism. Parallel Comput 29(2):209–239Google Scholar
  26. 26.
    Rastello F, Robert Y (May 2002) Automatic partitioning of parallel loops with parallelepiped-shaped tiles. IEEE Trans Parallel Distrib Syst 13(5):460–470Google Scholar
  27. 27.
    Renganarayana L, Rajopadhye S (2004) A geometric programming framework for optimal multi-level tiling. In: Proceedings of the 2004 ACM/IEEE conference on supercomputing, Pittsburgh, PA, 6–12 Nov 2004, p 18Google Scholar
  28. 28.
    Renganarayanan L, Kim D, Rajopadhye S, Strout MM (June 2007) Parameterized tiled loops for free. In: PLDI’07, ACM SIGPLAN Not 42(6)Google Scholar
  29. 29.
    Strzodka R, Shaheen M, Pajak D, Seidel H-P (2010) Cache oblivious parallelograms in iterative stencil computations. In: ICS’10: proceedings of the 24th ACM international conference on supercomputing, Tsukuba, Japan, pp 49–59Google Scholar
  30. 30.
    Tang P, Xue J (2000) Generating efficient tiled code for distributed memory machines. Parallel Comput 26(11):1369–1410zbMATHGoogle Scholar
  31. 31.
    Wolf ME, Lam MS (October 1991) A loop transformation theory and an algorithm to maximize parallelism. IEEE Trans Parallel Distrib Syst 2(4):452–471Google Scholar
  32. 32.
    Wolfe MJ (1987) Iteration space tiling for memory hierarchies. In: Rodrigue G (ed) Parallel processing for scientific computing. SIAM, Philadelphia, pp 357–361Google Scholar
  33. 33.
    Wolfe MJ (1989) More iteration space tiling. In: Proceedings of the 1989 ACM/IEEE conference on supercomputing, Reno, NV, 12–17 Nov 1989, pp 655–664Google Scholar
  34. 34.
    Wolfe MJ (1995) High performance compilers for parallel computing. Addison-Wesley Longman, BostonGoogle Scholar
  35. 35.
    Xue J (2000) Loop tiling for parallelism. Kluwer, BostonzbMATHGoogle Scholar
  36. 36.
    Xue J, Cai W (June 2002) Time-minimal tiling when rise is larger than zero. Parallel Comput 28(6):915–939MathSciNetGoogle Scholar
  37. 37.
    Xue J, Huang C-H (December 1998) Reuse-driven tiling for improving data locality. Int J Parallel Program 26(6):671–696Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2011

Authors and Affiliations

  • François Irigoin
    • 1
  1. 1.Centre de recherche en informatique, Mathématiques et systèmesMINES ParisTech/CRIFontainebleauFrance