Loop Nest Tiling for Image Processing and Communication Applications

  • Wlodzimierz BieleckiEmail author
  • Marek Palkowski
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 534)


Loop nest tiling is one of the most important loop nest optimizations. This paper presents a practical framework for automatic tiling of affine loop nests to reduce time of application execution which is crucial for the quality of image processing and communication systems. Our framework is derived via a combination of the Polyhedral and Iteration Space Slicing models and uses the transitive closure of loop nest dependence graphs. To describe and implement the approach in the source-to-source TRACO compiler, loop dependences are presented in the form of tuple relations. We expose the applicability of the framework to generate tiled code for image analysis, encoding and communication program loop nests from the UTDSP Benchmark Suite. Experimental results demonstrate the speed-up of optimized tiled programs generated by means of the approach implemented in TRACO.


Optimizing compilers Tiling Transitive closure Data locality Image processing Communication 


  1. 1.
    Bastoul, C.: Code generation in the polyhedral model is easier than you think. In: PACT 2013 IEEE International Conference on Parallel Architecture and Compilation Techniques, pp. 7–16, Juan-les-Pins, September 2004Google Scholar
  2. 2.
    Beletska, A., Bielecki, W., Cohen, A., Palkowski, M., Siedlecki, K.: Coarse-grained loop parallelization: iteration space slicing vs affine transformations. Parallel Comput. 37, 479–497 (2011)CrossRefGoogle Scholar
  3. 3.
    Benabderrahmane, M.-W., Pouchet, L.-N., Cohen, A., Bastoul, C.: The polyhedral model is more widely applicable than you think. In: Gupta, R. (ed.) CC 2010. LNCS, vol. 6011, pp. 283–303. Springer, Heidelberg (2010). doi: 10.1007/978-3-642-11970-5_16 CrossRefGoogle Scholar
  4. 4.
    Bielecki, W., Palkowski, M.: Coarse-grained loop parallelization for image processing and communication applications. In: Choraś, R.S. (ed.) Image Processing and Communications Challenges 2, pp. 307–314. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  5. 5.
    Bielecki, W., Palkowski, M.: Perfectly nested loop tiling transformations based on the transitive closure of the program dependence graph. In: Wiliński, A., Fray, I., Pejaś, J. (eds.) Soft Computing in Computer and Information Science. AISC, vol. 342, pp. 309–320. Springer, Heidelberg (2015). doi: 10.1007/978-3-319-15147-2_26 Google Scholar
  6. 6.
    Bielecki, W., Palkowski, M., Klimek, T.: Free scheduling for statement instances of parameterized arbitrarily nested affine loops. Parallel Comput. 38(9), 518–532 (2012)CrossRefGoogle Scholar
  7. 7.
    Bondhugula, U., Hartono, A., Ramanujam, J., Sadayappan, P.: A practical automatic polyhedral parallelizer and locality optimizer. SIGPLAN Not. 43(6), 101–113 (2008)CrossRefGoogle Scholar
  8. 8.
    Chang, C.W., Chen, J.J., Kuo, T.W., Falk, H.: Real-time partitioned scheduling on multi-core systems with local and global memories. In: 2013 18th Asia and South Pacific Design Automation Conference (ASP-DAC), pp. 467–472, January 2013Google Scholar
  9. 9.
    Feautrier, P.: Some efficient solutions to the affine scheduling problem: I. one-dimensional time. Int. J. Parallel Program. 21(5), 313–348 (1992)MathSciNetCrossRefzbMATHGoogle Scholar
  10. 10.
    Feautrier, P.: Some efficient solutions to the affine scheduling problem: II. multi-dimensional time. Int. J. Parallel Program. 21(5), 389–420 (1992)MathSciNetCrossRefzbMATHGoogle Scholar
  11. 11.
    Griebl, M.: Automatic parallelization of loop programs for distributed memory architectures (2004)Google Scholar
  12. 12.
    Grosser, T., Verdoolaege, S., Cohen, A.: Polyhedral AST generation is more than scanning polyhedra. ACM Trans. Program. Lang. Syst. 37(4), 12:1–12:50 (2015)CrossRefGoogle Scholar
  13. 13.
    Irigoin, F., Triolet, R.: Supernode partitioning. In: Proceedings of the 15th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, pp. 319–329, POPL 1988, NY, USA. ACM, New York (1988)Google Scholar
  14. 14.
    Kelly, W., Maslov, V., Pugh, W., Rosser, E., Shpeisman, T., Wonnacott, D.: The omega library interface guide. Technical report, College Park, MD, USA (1995)Google Scholar
  15. 15.
    Lim, A., Cheong, G.I., Lam, M.S.: An affine partitioning algorithm to maximize parallelism and minimize communication. In: Proceedings of the 13th ACM SIGARCH International Conference on Supercomputing, pp. 228–237. ACM Press (1999)Google Scholar
  16. 16.
    Murray, A.C., Bennett, R.V., Franke, B., Topham, N.: Code transformation and instruction set extension. ACM Trans. Embed. Comput. Syst. 8(4), 26:1–26:31 (2009)CrossRefGoogle Scholar
  17. 17.
    Palkowski, M.: Impact of variable privatization on extracting synchronization-free slices for multi-core computers. In: Keller, R., Kramer, D., Weiss, J.-P. (eds.) Facing the Multicore-Challenge III. LNCS, vol. 7686, pp. 72–83. Springer, Heidelberg (2013). doi: 10.1007/978-3-642-35893-7_7 CrossRefGoogle Scholar
  18. 18.
    Park, E., Pouchet, L.N., Cavazos, J., Cohen, A., Sadayappan, P.: Predictive modeling in a polyhedral optimization space. In: 9th IEEE/ACM International Symposium on Code Generation and Optimization (CGO 2011), pp. 119–129. IEEE Computer Society press, Chamonix, France, April 2011Google Scholar
  19. 19.
    en Peng, S.H.: UTDSP: A VLIW programmable DSP processor. Thesis, University of Toronto (1999)Google Scholar
  20. 20.
    Pouchet, L.N., Bastoul, C., Cohen, A., Vasilache, N.: Iterative optimization in the polyhedral model: Part i, one-dimensional time and part ii, multi-dimensional time. In: International Symposium on Code Generation and Optimization (CGO 2007), pp. 144–156, March 2007Google Scholar
  21. 21.
    Pouchet, L.N., Bastoul, C., Cavazos, J., Cohen, A.: A note on the performance distribution of affine schedules. In: 2nd Workshop on Statistical and Machine Learning Approaches to Architectures and Compilation (SMART 2008), Göteborg, Sweden, January 2008Google Scholar
  22. 22.
    Pugh, W., Wonnacott, D.: An exact method for analysis of value-based array data dependences. In: Banerjee, U., Gelernter, D., Nicolau, A., Padua, D. (eds.) LCPC 1993. LNCS, vol. 768, pp. 546–566. Springer, Heidelberg (1994). doi: 10.1007/3-540-57659-2_31 CrossRefGoogle Scholar
  23. 23.
    Verdoolaege, S.: Integer set library - manual. Technical report (2011).
  24. 24.
    Wonnacott, D.G., Strout, M.M., Wonnacott, D.G., Strout, M.M.: On the scalability of loop tiling techniques. In: Proceedings of the 3rd International Workshop on Polyhedral Compilation Techniques (IMPACT) (2013)Google Scholar
  25. 25.
    Xue, J.: On tiling as a loop transformation. Parallel Process. Lett. 7(4), 409–424 (1997)MathSciNetCrossRefGoogle Scholar
  26. 26.

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  1. 1.Faculty of Computer Science and Information SystemsWest Pomeranian University of Technology in SzczecinSzczecinPoland

Personalised recommendations