Polyhedral Code Generation in the Real World

  • Nicolas Vasilache
  • Cédric Bastoul
  • Albert Cohen
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3923)


The polyhedral model is known to be a powerful framework to reason about high level loop transformations. Recent developments in optimizing compilers broke some generally accepted ideas about the limitations of this model. First, thanks to advances in dependence analysis for irregular access patterns, its applicability which was supposed to be limited to very simple loop nests has been extended to wide code regions. Then, new algorithms made it possible to compute the target code for hundreds of statements while this code generation step was expected not to be scalable. Such theoretical advances and new software tools allowed actors from both academia and industry to study more complex and realistic cases. Unfortunately, despite strong optimization potential of a given transformation for e.g., parallelism or data locality, code generation may still be challenging or result in high control overhead. This paper presents scalable code generation methods that make possible the application of increasingly complex program transformations. By studying the transformations themselves, we show how it is possible to benefit from their properties to dramatically improve both code generation quality and space/time complexity, with respect to the best state-of-the-art code generation tool. In addition, we build on these improvements to present a new algorithm improving generated code performance for strided domains and reindexed schedules.


Code Generation Polyhedral Model Target Code Node Fusion Loop Interchange 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Ancourt, C., Irigoin, F.: Scanning polyhedra with DO loops. In: 3rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, June 1991, pp. 39–50 (1991)Google Scholar
  2. 2.
    Bastoul, C.: Efficient code generation for automatic parallelization and optimization. In: ISPDC 2003 IEEE Intl. Symp. on Parallel and Distributed Computing, Ljubljana, October 2003, pp. 23–30 (2003)Google Scholar
  3. 3.
    Bastoul, C.: Code generation in the polyhedral model is easier than you think. In: PACT 13 IEEE Intl. Conf. on Parallel Architecture and Compilation Techniques, Juan-les-Pins, September 2004, pp. 7–16 (2004)Google Scholar
  4. 4.
    Bastoul, C., Feautrier, P.: Improving data locality by chunking. In: Hedin, G. (ed.) CC 2003. LNCS, vol. 2622, pp. 320–334. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  5. 5.
    Boulet, P., Darte, A., Silber, G.-A., Vivien, F.: Loop parallelization algorithms: From parallelism extraction to code generation. Parallel Computing 24(3), 421–444 (1998)MathSciNetCrossRefMATHGoogle Scholar
  6. 6.
    Cohen, A., Girbal, S., Parello, D., Sigler, M., Temam, O., Vasilache, N.: Facilitating the search for compositions of program transformations. In: ACM ICS 2005 International Conference on Supercomputing, Cambridge, June 2005, pp. 151–160 (2005)Google Scholar
  7. 7.
    Dantzig, G.: Maximization of a linear function of variables subject to linear inequalities. In: Koopmans, T. (ed.) Activity Analysis of Production and Allocation, Cowles Commission Monograph No. 13, New York, pp. 339–347. John Wiley & Sons, Chichester (1951)Google Scholar
  8. 8.
    Darte, A., Robert, Y.: Mapping uniform loop nests onto distributed memory architectures. Parallel Computing 20(5), 679–710 (1994)CrossRefMATHGoogle Scholar
  9. 9.
    Feautrier, P.: Dataflow analysis of scalar and array references. International Journal of Parallel Programming 20(1), 23–53 (1991)CrossRefMATHGoogle Scholar
  10. 10.
    Feautrier, P.: Some efficient solutions to the affine scheduling problem, part II: multidimensional time. Int. Journal of Parallel Programming 21(6), 389–420 (1992)MathSciNetCrossRefMATHGoogle Scholar
  11. 11.
    Franke, B., O’Boyle, M.: A complete compiler approach to auto-parallelizing c programs for Multi-DSP systems. IEEE Transactions on Parallel and Distributed Systems (TPDS) 16(3), 234–245 (2005)CrossRefGoogle Scholar
  12. 12.
    Griebl, M.: Automatic parallelization of loop programs for distributed memory architectures. Habilitation thesis. Facultät für Mathematik und Informatik, Universität Passau (2004)Google Scholar
  13. 13.
    Hurbain, I., Ancourt, C., Irigoin, F., Barreteau, M., Mattioli, J., Paquier, F.: A case study of design space exploration for embedded multimedia applications in SoCs. Technical Report A-361, CRI – École des Mines de Paris (February 2005)Google Scholar
  14. 14.
    Kapasi, U., Rixner, S., Dally, W., Khailany, B., Ho Ahn, J., Mattson, P., Owens, J.: Programmable stream processors. IEEE Computer 36(8), 54–62 (2003)CrossRefGoogle Scholar
  15. 15.
    Kelly, W., Pugh, W.: A framework for unifying reordering transformations. Technical Report CS-TR-3193, University of Maryland (1993)Google Scholar
  16. 16.
    Kelly, W., Pugh, W., Rosser, E.: Code generation for multiple mappings. In: Frontiers 1995 Symposium on the frontiers of massively parallel computation, McLean (1995)Google Scholar
  17. 17.
    Kuck, D.: The Structure of Computers and Computations. John Wiley & Sons, Chichester (1978)Google Scholar
  18. 18.
    Le Verge, H.: A note on Chernikova’s algorithm. Technical Report 635, IRISA (1992)Google Scholar
  19. 19.
    Lengauer, C.: Loop parallelization in the polytope model. In: Best, E. (ed.) CONCUR 1993. LNCS, vol. 715, pp. 398–416. Springer, Heidelberg (1993)Google Scholar
  20. 20.
    Li, W., Pingali, K.: A singular loop transformation framework based on non-singular matrices. International Journal of Parallel Programming 22(2), 183–205 (1994)MathSciNetCrossRefMATHGoogle Scholar
  21. 21.
    Lim, A., Lam, M.: Maximizing parallelism and minimizing synchronization with affine transforms. In: PoPL 24 ACM Symp. on Principles of Programming Languages, Paris, January 1997, pp. 201–214 (1997)Google Scholar
  22. 22.
    Müller-Pfefferkorn, R., Nagel, W., Trenkler, B.: Optimizing cache access: A tool for source-to-source transformations and real-life compiler tests. In: Danelutto, M., Vanneschi, M., Laforenza, D. (eds.) Euro-Par 2004. LNCS, vol. 3149, pp. 72–81. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  23. 23.
    Pugh, W.: The omega test: a fast and practical integer programming algorithm for dependence analysis. In: Proceedings of the third ACM/IEEE conference on Supercomputing, Albuquerque, August 1991, pp. 4–13 (1991)Google Scholar
  24. 24.
    Pugh, W.: Uniform techniques for loop optimization. In: ICS’5 ACM International Conference on Supercomputing, Cologne, June 1991, pp. 341–352 (1991)Google Scholar
  25. 25.
    Quilleré, F., Rajopadhye, S., Wilde, D.: Generation of efficient nested loops from polyhedra. International Journal of Parallel Programming 28(5), 469–498 (2000)CrossRefGoogle Scholar
  26. 26.
    Ramanujam, J.: Beyond unimodular transformations. J. of Supercomputing 9(4), 365–389 (1995)CrossRefMATHGoogle Scholar
  27. 27.
    Schrijver, A.: Theory of linear and integer programming. John Wiley & Sons, Chichester (1986)MATHGoogle Scholar
  28. 28.
    Wolfe, M.: High performance compilers for parallel computing. Addison-Wesley, Reading (1995)MATHGoogle Scholar
  29. 29.
    Xue, J.: Automating non-unimodular loop transformations for massive parallelism. Parallel Computing 20(5), 711–728 (1994)MathSciNetCrossRefMATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Nicolas Vasilache
    • 1
  • Cédric Bastoul
    • 1
  • Albert Cohen
    • 1
  1. 1.ALCHEMY Group, INRIA Futurs and LRIUniversité Paris-Sud XIFrance

Personalised recommendations