Loopy: Programmable and Formally Verified Loop Transformations

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9837)


This paper presents a system, Loopy, for programming loop transformations. Manual loop transformation can be tedious and error-prone, while fully automated methods do not guarantee improvements. Loopy takes a middle path: a programmer specifies a loop transformation at a high level, which is then carried out automatically by Loopy, and formally verified to guard against specification and implementation mistakes. Loopy ’s notation offers considerable flexibility with assembling transformations, while automation and checking prevent errors. Loopy is implemented for the LLVM framework, building on a polyhedral compilation library. Experiments show substantial improvements over fully automated loop transformations, using simple and direct specifications.


Affine Transformation Iteration Space Loop Nest Statement Instance Iteration Domain 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.



We would like to thank our colleagues at Bell Labs and at the University of Pennsylvania for their helpful comments on this research. This work was supported, in part, by DARPA under agreement number FA8750-12-C-0166. The U.S. Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright notation thereon. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of DARPA or the U.S. Government.


  1. 1.
    Aho, A.V., Lam, M.S., Sethi, R., Ullman, J.D.: Compilers: Principles, Techniques, and Tools. Addison Wesley, Boston (2006)MATHGoogle Scholar
  2. 2.
    Allen, R., Kennedy, K.: Optimizing Compilers for Modern Architectures: A Dependence-Based Approach. Morgan Kaufmann, San Francisco (2001)Google Scholar
  3. 3.
    Banerjee, U.: Loop Transformations for Restructuring Compilers: Dependence analysis. Kluwer (1997)Google Scholar
  4. 4.
    Benabderrahmane, M., Pouchet, L., Cohen, A., Bastoul, C.: The polyhedral model is more widely applicable than you think. In: Compiler Construction, 19th International Conference, CC 2010, Held as Part of the Joint European Conferences on Theory and Practice of Software, ETAPS 2010, Paphos, Cyprus, 20–28 March 2010, Proceedings, pp. 283–303 (2010). http://dx.doi.org/10.1007/978-3-642-11970-5_16
  5. 5.
    Bondhugula, U., Hartono, A., Ramanujam, J., Sadayappan, P.: A practical automatic polyhedral parallelizer and locality optimizer. In: Proceedings ofthe ACM SIGPLAN 2008 Conference on Programming Language Design and Implementation, Tucson, AZ, USA, 7–13 June 2008, pp. 101–113 (2008). http://doi.acm.org/10.1145/1375581.1375595
  6. 6.
    Chen, C., Chame, J., Hall, M.: CHiLL: A framework for composing high-level loop transformations. Technical Report 08–897, University of Southern California (2008)Google Scholar
  7. 7.
    Donadio, S., Brodman, J., Roeder, T., Yotov, K., Barthou, D., Cohen, A., Garzarán, M.J., Padua, D.A., Pingali, K.K.: A language for the compact representation of multiple program versions. In: Ayguadé, E., Baumgartner, G., Ramanujam, J., Sadayappan, P. (eds.) LCPC 2005. LNCS, vol. 4339, pp. 136–151. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  8. 8.
    Feautrier, P.: Some efficient solutions to the affine scheduling problem. Part II. Multidimensional time. Int. J. Parallel Program. 21(6), 389–420 (1992). http://dx.doi.org/10.1007/BF01379404 MathSciNetCrossRefMATHGoogle Scholar
  9. 9.
    Girbal, S., Vasilache, N., Bastoul, C., Cohen, A., Parello, D., Sigler, M., Temam, O.: Semi-automatic composition of loop transformations for deep parallelism and memory hierarchies. Int. J. Parallel Program. 34(3), 261–317 (2006). http://dx.doi.org/10.1007/s10766-006-0012-3 CrossRefMATHGoogle Scholar
  10. 10.
    Grosser, T., Größlinger, A., Lengauer, C.: Polly-performing polyhedral optimizations on a low-level intermediate representation. Parallel Process. Lett. 22(4) (2012). http://dx.doi.org/10.1142/S0129626412500107
  11. 11.
    Grosser, T., Verdoolaege, S., Cohen, A.: Polyhedral AST generation is more than scanning polyhedra. ACM Trans. Program. Lang. Syst. 37(4) (2015). http://dx.doi.org/10.1145/2743016. Article no. 12
  12. 12.
    Hartono, A., Norris, B., Sadayappan, P.: Annotation-based empirical performance tuning using Orio. In: 23rd IEEE International Symposium on Parallel and Distributed Processing, IPDPS 2009, Rome, Italy, 23–29 May 2009, pp. 1–11 (2009). http://dx.doi.org/10.1109/IPDPS.2009.5161004
  13. 13.
    Intel: Intel Math Kernel Library (MKL) (2016). https://software.intel.com/en-us/intel-mkl/
  14. 14.
    Kelly, W., Pugh, W.: A framework for unifying reordering transformations. Technical Report UMIAS-TR-92-126.1, Univ. of Maryland, College Park, MD, USA (1993)Google Scholar
  15. 15.
    Khan, M.M., Basu, P., Rudy, G., Hall, M.W., Chen, C., Chame, J.: A script-based autotuning compiler system to generate high-performance CUDA code. TACO 9(4), 31 (2013). http://doi.acm.org/10.1145/2400682.2400690 CrossRefGoogle Scholar
  16. 16.
    Lamport, L.: The parallel execution of DO loops. Commun. ACM 17(2), 83–93 (1974). http://doi.acm.org/10.1145/360827.360844 MathSciNetCrossRefMATHGoogle Scholar
  17. 17.
    Lattner, C., Adve, V.S.: LLVM: A compilation framework for lifelong program analysis & transformation. In: CGO, pp. 75–88 (2004). llvm.orgGoogle Scholar
  18. 18.
    Muchnick, S.S.: Advanced Compiler Design and Implementation. Morgan Kaufmann, San Francisco (1997)Google Scholar
  19. 19.
    Namjoshi, K.S., Singhania, N.: Loopy: programmable and formally verified loop transformations. Technical Report MS-CIS-16-04, Department of Computer and Information Science, University of Pennsylvania (2016)Google Scholar
  20. 20.
    Pouchet, L.N.: Polybench, the polyhedral benchmark suite (2015). http://polybench.sourceforge.net/
  21. 21.
    Püschel, M., Moura, J.M.F., Johnson, J., Padua, D., Veloso, M., Singer, B., Xiong, J., Franchetti, F., Gacic, A., Voronenko, Y., Chen, K., Johnson, R.W., Rizzolo, N.: SPIRAL: code generation for DSP transforms. Proc. IEEE 93(2), 232–275 (2005). Program Generation, Optimization, and AdaptationCrossRefGoogle Scholar
  22. 22.
    Ragan-Kelley, J., Barnes, C., Adams, A., Paris, S., Durand, F., Amarasinghe, S.: Halide: a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines. In: Proceedings of the 34th ACMSIGPLAN Conference on Programming Language Design and Implementation, pp. 519–530, PLDI 2013. ACM, New York (2013). http://doi.acm.org/10.1145/2491956.2462176
  23. 23.
    Rudy, G., Khan, M.M., Hall, M.W., Chen, C., Chame, J.: A programming language interface to describe transformations and code generation. In: Languages and Compilers for Parallel Computing - 23rd International Workshop, LCPC 2010, Houston, TX, USA, 7–9 October 2010, Revised Selected Papers, pp. 136–150 (2010). http://dx.doi.org/10.1007/978-3-642-19595-2_10
  24. 24.
    Spampinato, D.G., Püschel, M.: A basic linear algebra compiler. In: Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization, pp. 23:23–23:32, CGO 2014. ACM, New York (2014). http://doi.acm.org/10.1145/2544137.2544155
  25. 25.
    Steuwer, M., Fensch, C., Lindley, S., Dubach, C.: Generating performance portable code using rewrite rules: from high-level functional expressions to high-performance OpenCL code. SIGPLAN Not. 50(9), 205–217 (2015). http://doi.acm.org/10.1145/2858949.2784754 CrossRefGoogle Scholar
  26. 26.
    Tiwari, A., Chen, C., Chame, J., Hall, M.W., Hollingsworth, J.K.: A scalable auto-tuning framework for compiler optimization. In: 23rd IEEE International Symposium on Parallel and Distributed Processing, IPDPS 2009, Rome, Italy, 23–29 May 2009, pp. 1–12 (2009). http://dx.doi.org/10.1109/IPDPS.2009.5161054
  27. 27.
    Verdoolaege, S.: isl: an integer set library for the polyhedral model. In: Mathematical Software - ICMS 2010, Third International Congress on Mathematical Software, Kobe, Japan, 13–17 September 2010, Proceedings, pp. 299–302 (2010). http://dx.doi.org/10.1007/978-3-642-15582-6_49
  28. 28.
    Whaley, R.C., Dongarra, J.J.: Automatically tuned linear algebra software. In: Proceedings of the 1998 ACM/IEEE Conference on Supercomputing, pp. 1–27, SC 1998. IEEE Computer Society, Washington, DC (1998). http://dl.acm.org/citation.cfm?id=509058.509096
  29. 29.
    Yi, Q.: POET: a scripting language for applying parameterized source-to-source program transformations. Softw. Pract. Exper. 42(6), 675–706 (2012). http://dx.doi.org/10.1002/spe.1089

Copyright information

© Springer-Verlag GmbH Germany 2016

Authors and Affiliations

  1. 1.Bell LaboratoriesNokiaMurray HillUSA
  2. 2.University of PennsylvaniaPhiladelphiaUSA

Personalised recommendations