International Journal of Parallel Programming

, Volume 41, Issue 5, pp 704–750 | Cite as

Predictive Modeling in a Polyhedral Optimization Space

  • Eunjung Park
  • John Cavazos
  • Louis-Noël Pouchet
  • Cédric Bastoul
  • Albert Cohen
  • P. Sadayappan
Article

Abstract

High-level program optimizations, such as loop transformations, are critical for high performance on multi-core targets. However, complex sequences of loop transformations are often required to expose parallelism (both coarse-grain and fine-grain) and improve data locality. The polyhedral compilation framework has proved to be very effective at representing these complex sequences and restructuring compute-intensive applications, seamlessly handling perfectly and imperfectly nested loops. It models arbitrarily complex sequences of loop transformations in a unified mathematical framework, dramatically increasing the expressiveness (and expected effectiveness) of the loop optimization stage. Nevertheless identifying the most effective loop transformations remains a major challenge: current state-of-the-art heuristics in polyhedral frameworks simply fail to expose good performance over a wide range of numerical applications. Their lack of effectiveness is mainly due to simplistic performance models that do not reflect the complexity today’s processors (CPU, cache behavior, etc.). We address the problem of selecting the best polyhedral optimizations with dedicated machine learning models, trained specifically on the target machine. We show that these models can quickly select high-performance optimizations with very limited iterative search. We decouple the problem of selecting good complex sequences of optimizations in two stages: (1) we narrow the set of candidate optimizations using static cost models to select the loop transformations that implement specific high-level optimizations (e.g., tiling, parallelism, etc.); (2) we predict the performance of each high-level complex optimization sequence with trained models that take as input a performance-counter characterization of the original program. Our end-to-end framework is validated using numerous benchmarks on two modern multi-core platforms. We investigate a variety of different machine learning algorithms and hardware counters, and we obtain performance improvements over productions compilers ranging on average from \(3.2\times \) to \(8.7\times \), by running not more than \(6\) program variants from a polyhedral optimization space.

Keywords

Loop transformation Polyhedral optimization Iterative compilation Machine learning Performance counters 

Notes

Acknowledgments

This work was funded in part by the U.S. National Science Foundation through awards 0926688, 0811781, 0811457, 0926687 and 0926127, the Defense Advanced Research Projects Agency through AFRL Contract FA8650-09-C-7915, the DARPA Computer Science Study Group (CSSG), the U.S. Department of Energy through award DE-FC02-06ER25755, and NSF Career award 0953667.

References

  1. 1.
    Agakov, F., Bonilla, E., Cavazos, J., Franke, B., Fursin, G., O’Boyle, M., Thomson, J., Toussaint, M., Williams, C.: Using machine learning to focus iterative optimization. In: Proceedings of the International Symposium on Code Generation and Optimization (CGO) (2006)Google Scholar
  2. 2.
    Aha, D.W., Kibler, D., Albert, M.K.: Instance-based learning algorithms. Int. J. Mach. Learn. 6, 37–66 (1991)Google Scholar
  3. 3.
    Almagor, L., Cooper, K., Grosul, A., Harvey, T., Reeves, S., Subramanian, D., Torczon, L., Waterman, T.: Finding effective compilation sequences. In: Proceedings of the International Conference on Languages, Compilers, and Tools for Embedded Systems (LCTES), pp. 231–239. New York (2004)Google Scholar
  4. 4.
    Anderson, E., Bai, Z., Dongarra, J., Greenbaum, A., McKenney, A., Du Croz, J., Hammerling, S., Demmel, J., Bischof, C., Sorensen, D.: Lapack: a portable linear algebra library for high-performance computers. In: Proceedings of the 1990 ACM/IEEE conference on Supercomputing, Supercomputing ’90, pp. 2–11. IEEE Computer Society Press, Los Alamitos, CA, USA (1990) http://dl.acm.org/citation.cfm?id=110382.110385
  5. 5.
    Bastoul, C.: Code generation in the polyhedral model is easier than you think. In: Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (PACT), (2004)Google Scholar
  6. 6.
    Baumgartner, G., Bernholdt, D., Cociorva, D., Harrison, R., Hirata, S., Lam, C.C., Nooijen, M., Pitzer, R., Ramanujam, J., Sadayappan, P.: A high-level approach to synthesis of high-performance codes for quantum chemistry. In: Supercomputing (2002)Google Scholar
  7. 7.
    Benabderrahmane, M.W., Pouchet, L.N., Cohen, A., Bastoul, C.: The polyhedral model is more widely applicable than you think. In: Proceedings of the International Conference on Compiler Construction (ETAPS CC), LNCS 6011, pp. 283–303 (2010)Google Scholar
  8. 8.
    Bondhugula, U., Baskaran, M., Krishnamoorthy, S., Ramanujam, J., Rountev, A., Sadayappan, P.: Automatic transformations for communication-minimized parallelization and locality optimization in the polyhedral model. In: Proceedings of the International Conference on Compiler Construction (ETAPS CC) (2008)Google Scholar
  9. 9.
    Bondhugula, U., Hartono, A., Ramanujam, J., Sadayappan, P.: A practical automatic polyhedral program optimization system. In: Proceedings of the International Conference on Programming Language Design and Implementation (PLDI) (2008)Google Scholar
  10. 10.
    Bouckaert, R.R., Frank, E., Hall, M.A., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: WEKA-experiences with a java open-source project. J. Mach. Learn. Res. 11, 2533–2541 (2010)MATHGoogle Scholar
  11. 11.
    Cavazos, J., Dubach, C., Agakov, F., Bonilla, E., O’Boyle, M.F., Fursin, G., Temam, O.: Automatic performance model construction for the fast software exploration of new hardware designs. In: International Conference on Compilers, Architectures and Synthesis of Embedded Systems (CASES) (2006)Google Scholar
  12. 12.
    Cavazos, J., Fursin, G., Agakov, F.V., Bonilla, E.V., O’Boyle, M.F.P., Temam, O.: Rapidly selecting good compiler optimizations using performance counters. In: Proceedings of the International Symposium on Code Generation and Optimization (CGO) (2007)Google Scholar
  13. 13.
    Chen, C., Chame, J., Hall, M.: CHiLL: A framework for composing high-level loop transformations. Tech. Rep. 08–897, U. of Southern California (2008)Google Scholar
  14. 14.
    Chen, Y., Huang, Y., Eeckhout, L., Fursin, G., Peng, L., Temam, O., Wu, C.: Evaluating iterative optimization across 1000 datasets. In: Proceedings of the 2010 ACM SIGPLAN Conference on Programming language design and implementation, PLDI ’10, pp. 448–459. ACM, New York, NY, USA (2010).10.1145/1806596.1806647
  15. 15.
    Cleary, J.G., Trigg, L.E.: K*: an instance-based learner using an entropic distance measure. In: In Proceedings of the 12th International Conference on Machine Learning, pp. 108–114. Morgan Kaufmann (1995)Google Scholar
  16. 16.
    Cooper, K.D., Grosul, A., Harvey, T.J., Reeves, S., Subramanian, D., Torczon, L., Waterman, T.: Acme: adaptive compilation made efficient. In: Proceedings of the International Conference on Languages, Compilers, and Tools for Embedded Systems (LCTES), pp. 69–77. ACM Press, New York, NY, USA (2005). doi:10.1145/1065910.1065921
  17. 17.
    Cooper, K.D., Schielke, P.J., Subramanian, D.: Optimizing for reduced code space using genetic algorithms. In: Proceedings of the International Conference on Languages, Compilers, and Tools for Embedded Systems (LCTES), pp. 1–9. ACM Press (1999)Google Scholar
  18. 18.
    Cooper, K.D., Subramanian, D., Torczon, L.: Adaptive optimizing compilers for the 21st century. J. Supercomput. 23(1), 7–22 (2002)MATHCrossRefGoogle Scholar
  19. 19.
    Datta, K., Kamil, S., Williams, S., Oliker, L., Shalf, J., Yelick, K.: Optimization and performance modeling of stencil computations on modern microprocessors. SIAM Review 51(1) (2009) doi:10.1137/070693199. http://link.aip.org/link/?SIR/51/129/1
  20. 20.
    Dubach, C., Cavazos, J., Franke, B., O’Boyle, M., Fursin, G., Temam, O.: Fast compiler optimisation evaluation using code-feature based performance prediction. In: Proceedings of the International Conference on Computing Frontiers (CF) (2007)Google Scholar
  21. 21.
    Dubach, C., Jones, T.M., Bonilla, E.V., Fursin, G., O’Boyle, M.F.: Portable compiler optimization across embedded programs and microarchitectures using machine learning. In: Proceedings of the International Symposium on Microarchitecture (MICRO) (2009)Google Scholar
  22. 22.
    Feautrier, P.: Some efficient solutions to the affine scheduling problem, part I: one dimensional time. Int. J. Parallel Program (IJPP) 21(5), 313–348 (1992)MathSciNetMATHCrossRefGoogle Scholar
  23. 23.
    Feautrier, P.: Some efficient solutions to the affine scheduling problem, part II: multidimensional time. Int. J. Parallel Program. (IJPP) 21(6), 389–420 (1992)MathSciNetMATHCrossRefGoogle Scholar
  24. 24.
    Franke, B., O’Boyle, M., Thomson, J., Fursin, G.: Probabilistic source-level optimisation of embedded programs. In: Proceedings of the International Conference on Languages, Compilers, and Tools for Embedded Systems (LCTES), pp. 78–86. ACM Press (2005)Google Scholar
  25. 25.
    Frigo, M., Johnson, S.G.: The design and implementation of FFTW3. In: Proceedings of the IEEE 93(2), 216–231 (2005) Special issue on “Program Generation, Optimization, and Platform Adaptation”Google Scholar
  26. 26.
    Fursin, G., Cavazos, J., Temam, O.: Midatasets: creating the conditions for a more realistic evaluation of iterative optimization. In: In Proceedings of the International Conference on High Performance Embedded Architectures and Compilers (HiPEAC), pp. 245–260. Springer LNCS (2007)Google Scholar
  27. 27.
    Fursin, G., Miranda, C., Temam, O., Namolaru, M., Yom-Tov, E., Zaks, A., Mendelson, B., Barnard, P., Ashton, E., Courtois, E., Bodin, F., Bonilla, E., Thomson, J., Leather, H., Williams, C., O’Boyle, M.: MILEPOST GCC: machine learning based research compiler. In: Proceedings of the GCC Developers’ Summit (2008)Google Scholar
  28. 28.
    Girbal, S., Vasilache, N., Bastoul, C., Cohen, A., Parello, D., Sigler, M., Temam, O.: Semi-automatic composition of loop transformations. Int. J. Parallel Program. (IJPP) 34(3), 261–317 (2006)MATHCrossRefGoogle Scholar
  29. 29.
    Haneda, M., Knijnenburg, P.M.W., Wijshoff, H.A.G.: Automatic selection of compiler options using non-parametric inferential statistics. In: Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (PACT), pp. 123–132 (2005)Google Scholar
  30. 30.
    INRIA, The Ohio State University: Polybench, the polyhedral benchmark suite. http://polybench.sourceforge.net
  31. 31.
    Irigoin, F., Triolet, R.: Supernode partitioning. In: ACM SIGPLAN Principles of Programming Languages, pp. 319–329 (1988)Google Scholar
  32. 32.
    Kelly, W., Pugh, W.: A unifying framework for iteration reordering transformations. In: IEEEInternational Conference on Algorithms and Architectures for Parallel Processing (ICAPP’95), pp. 153–162 (1995)Google Scholar
  33. 33.
    Kisuki, T., Knijnenburg, P.M.W., O’Boyle, M.F.P.: Combined selection of tile sizes and unroll factors using iterative compilation. In: Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (PACT), p. 237 (2000)Google Scholar
  34. 34.
    Kulkarni, P., Hines, S., Hiser, J., Whalley, D., Davidson, J., Jones, D.: Fast searches for effective optimization phase sequences. In: Proceedings of the International Conference on Programming Language Design and Implementation (PLDI), pp. 171–182. ACM Press (2004)Google Scholar
  35. 35.
    Lim, A.W., Lam, M.S.: Maximizing parallelism and minimizing synchronization with affine transforms. In: Proceedings of the ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL), pp. 201–214. ACM Press (1997)Google Scholar
  36. 36.
    Long, S., Fursin, G.: A heuristic search algorithm based on unified transformation framework. In: Proceedings of the International Conference on Parallel Processing Workshops (ICPPW), pp. 137–144 (2005)Google Scholar
  37. 37.
    Monsifrot, A., Bodin, F., Quiniou, R.: A machine learning approach to automatic production of compiler heuristics. In: Proceedings of the International Conference on Artificial Intelligence: Methodology, Systems, and Applications (AIMSA), pp. 41–50 Springer, Berlin (2002)Google Scholar
  38. 38.
    Mucci, P.: Papi—the performance application programming interface. http://icl.cs.utk.edu/papi/index.html (2000)
  39. 39.
    Namolaru, M., Cohen, A., Fursin, G., Zaks, A., Freund, A.: Practical aggregation of semantical program properties for machine learning based optimization. In: International Conference on Compilers, Architectures and Synthesis of Embedded Systems (CASES) (2010)Google Scholar
  40. 40.
    Orozco, D., Gao, G.R.: Mapping the FDTD application to many-core chip architectures. In: ICPP (2009)Google Scholar
  41. 41.
    Parello, D., Temam, O., Cohen, A., Verdun, J.M.: Towards a systematic, pragmatic and architecture-aware program optimization process for complex processors. In: Proceedings of the ACM/IEEE conference on Supercomputing (SC), p. 15. IEEE Computer Society (2004)Google Scholar
  42. 42.
    Park, E., Cavazos, J., Alvarez, M.A.: Using graph-based program characterization for predictive modeling. In: 10th IEEE/ACM International Symposium on Code Generation and Optimization (CGO’12). IEEE Computer Society press, San Jose (2012)Google Scholar
  43. 43.
    Park, E., Pouchet, L.N., Cavazos, J., Cohen, A., Sadayappan, P.: Predictive modeling in a polyhedral optimization space. In: 9th IEEE/ACM International Symposium on Code Generation and Optimization (CGO’11), pp. 119–129. IEEE Computer Society press, Chamonix, France (2011)Google Scholar
  44. 44.
    Pouchet, L.N., Bastoul, C., Cohen, A., Cavazos, J.: Iterative optimization in the polyhedral model: Part II, multidimensional time. In: Proceedings of the International Conference on Programming Language Design and Implementation (PLDI), pp. 90–100. ACM Press (2008)Google Scholar
  45. 45.
    Pouchet, L.N., Bastoul, C., Cohen, A., Vasilache, N.: Iterative optimization in the polyhedral model: Part I, one-dimensional time. In: Proceedings of the International Symposium on Code Generation and Optimization (CGO), pp. 144–156. IEEE Computer Society Press (2007)Google Scholar
  46. 46.
    Pouchet, L.N., Bondhugula, U., Bastoul, C., Cohen, A., Ramanujam, J., Sadayappan, P.: Combined iterative and model-driven optimization in an automatic parallelization framework. In: Proceedings of the ACM/IEEE Conference on Supercomputing (SC) (2010). p. 11Google Scholar
  47. 47.
    Pouchet, L.N., Bondhugula, U., Bastoul, C., Cohen, A., Ramanujam, J., Sadayappan, P., Vasilache, N.: Loop transformations: Convexity, pruning and optimization. In: 38th ACM SIGACT-SIGPLAN Symposium on Principles of Programming Languages (POPL’11), pp. 549–562. ACM Press, Austin (2011)Google Scholar
  48. 48.
    Puschel, M., Moura, J., Johnson, J., Padua, D., Veloso, M., Singer, B., Xiong, J., Franchetti, F., Gacic, A., Voronenko, Y., Chen, K., Johnson, R.W., Rizzolo, N.: Spiral: code generation for dsp transforms. In: Proceedings of the IEEE 93(2), 232–275 (2005) Special issue on “Program Generation, Optimization, and Platform Adaptation”Google Scholar
  49. 49.
    Ramanujam, J., Sadayappan, P.: Tiling multidimensional iteration spaces for multicomputers. J. Parallel Distrib. Comput. 16(2), 108–230 (1992)CrossRefGoogle Scholar
  50. 50.
    Smith, G.: Numerical Solution of Partial Differential Equations: Finite Difference Methods. Oxford University Press, Oxford (2004)Google Scholar
  51. 51.
    Stephenson, M., Amarasinghe, S.: Predicting unroll factors using supervised classification. In: CGO ’05: Proceedings of the International Symposium on Code Generation and Optimization, pp. 123–134. IEEE Computer Society, Washington (2005). doi:10.1109/CGO.2005.29
  52. 52.
    Stephenson, M., Amarasinghe, S., Martin, M., O’Reilly, U.M.: Meta optimization: improving compiler heuristics with machine learning. SIGPLAN Not. 38(5):77–90 (2003) doi:10.1145/780822.781141 Google Scholar
  53. 53.
    Tiwari, A., Chen, C., Chame, J., Hall, M., Hollingsworth, J.K.: A scalable auto-tuning framework for compiler optimization. In: Proceedings of the International Parallel and Distributed Processing Symposium (IPDPS), pp. 1–12. IEEE Computer Society (2009)Google Scholar
  54. 54.
    Trifunovic, K., Nuzman, D., Cohen, A., Zaks, A., Rosen, I.: Polyhedral-model guided loop-nest auto-vectorization. In: Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (PACT) (2009)Google Scholar
  55. 55.
    Voronenko, Y., de Mesmay, F., Püschel, M.: Computer generation of general size linear transform libraries. In: Proceedings of the International Symposium on Code Generation and Optimization (CGO), pp. 102–113 (2009)Google Scholar
  56. 56.
    Vuduc, R., Demmel, J.W., Bilmes, J.A.: Statistical models for empirical search-based performance tuning. Int. J. High Perform. Comput. Appl. 18(1), 65–94 (2004)CrossRefGoogle Scholar
  57. 57.
    Whaley, R.C., Dongarra, J.J.: Automatically tuned linear algebra software. In: Proceedings of the ACM/IEEE Conference on Supercomputing (SC), pp. 1–27. IEEE Computer Society (1998)Google Scholar
  58. 58.
    Whaley, R.C., Petitet, A., Dongarra, J.J.: Automated empirical optimizations of software and the atlas project. Parallel Comput. (2000)Google Scholar
  59. 59.
    Wolf, M., Lam, M.: A data locality optimizing algorithm. In: ACM SIGPLAN’91 Conference on Programming Language Design and Implementation, pp. 30–44. New York (1991)Google Scholar
  60. 60.
    Yotov, K., Li, X., Ren, G., Cibulskis, M., DeJong, G., Garzaran, M., Padua, D., Pingali, K., Stodghill, P., Wu, P.: A comparison of empirical and model-driven optimization. In: Proceedings of the International Conference on Programming Language Design and Implementation (PLDI) (2003)Google Scholar
  61. 61.
    Yotov, K., Pingali, K., Stodghill, P.: Think globally, search locally. In: ICS ’05: Proceedings of the 19th Annual International Conference on Supercomputing, pp. 141–150. ACM Press, New York (2005). doi:10.1145/1088149.1088168

Copyright information

© Springer Science+Business Media New York 2013

Authors and Affiliations

  • Eunjung Park
    • 1
  • John Cavazos
    • 1
  • Louis-Noël Pouchet
    • 3
    • 2
  • Cédric Bastoul
    • 4
  • Albert Cohen
    • 5
  • P. Sadayappan
    • 2
  1. 1.University of DelawareNewarkUSA
  2. 2.The Ohio State UniversityColumbusUSA
  3. 3.University of California Los AngelesLos AngelesUSA
  4. 4.LRI, University of Paris-Sud 11Orsay CedexFrance
  5. 5.INRIA Paris-Rocquencourt / ENSLe Chesnay CedexFrance

Personalised recommendations