Milepost GCC: Machine Learning Enabled Self-tuning Compiler

  • Grigori Fursin
  • Yuriy Kashnikov
  • Abdul Wahid Memon
  • Zbigniew Chamski
  • Olivier Temam
  • Mircea Namolaru
  • Elad Yom-Tov
  • Bilha Mendelson
  • Ayal Zaks
  • Eric Courtois
  • Francois Bodin
  • Phil Barnard
  • Elton Ashton
  • Edwin Bonilla
  • John Thomson
  • Christopher K. I. Williams
  • Michael O’Boyle
Article

Abstract

Tuning compiler optimizations for rapidly evolving hardware makes porting and extending an optimizing compiler for each new platform extremely challenging. Iterative optimization is a popular approach to adapting programs to a new architecture automatically using feedback-directed compilation. However, the large number of evaluations required for each program has prevented iterative compilation from widespread take-up in production compilers. Machine learning has been proposed to tune optimizations across programs systematically but is currently limited to a few transformations, long training phases and critically lacks publicly released, stable tools. Our approach is to develop a modular, extensible, self-tuning optimization infrastructure to automatically learn the best optimizations across multiple programs and architectures based on the correlation between program features, run-time behavior and optimizations. In this paper we describe Milepost GCC, the first publicly-available open-source machine learning-based compiler. It consists of an Interactive Compilation Interface (ICI) and plugins to extract program features and exchange optimization data with the cTuning.org open public repository. It automatically adapts the internal optimization heuristic at function-level granularity to improve execution time, code size and compilation time of a new program on a given architecture. Part of the MILEPOST technology together with low-level ICI-inspired plugin framework is now included in the mainline GCC. We developed machine learning plugins based on probabilistic and transductive approaches to predict good combinations of optimizations. Our preliminary experimental results show that it is possible to automatically reduce the execution time of individual MiBench programs, some by more than a factor of 2, while also improving compilation time and code size. On average we are able to reduce the execution time of the MiBench benchmark suite by 11% for the ARC reconfigurable processor. We also present a realistic multi-objective optimization scenario for Berkeley DB library using Milepost GCC and improve execution time by approximately 17%, while reducing compilation time and code size by 12% and 7% respectively on Intel Xeon processor.

Keywords

Machine learning compiler Self-tuning compiler Adaptive compiler Automatic performance tuning Machine learning Program characterization Program features Collective optimization Continuous optimization Multi-objective optimization Empirical performance tuning Optimization repository Iterative compilation Feedback-directed compilation Adaptive compilation Optimization prediction Portable optimization 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    ACOVEA: Using Natural Selection to Investigate Software Complexities. http://www.coyotegulch.com/products/acovea
  2. 2.
    Agakov, F., Bonilla, E., Cavazos, J., Franke, B., Fursin, G., O’Boyle, M., Thomson, J., Toussaint, M., Williams, C.: Using machine learning to focus iterative optimization. In: Proceedings of the International Symposium on Code Generation and Optimization (CGO) (2006)Google Scholar
  3. 3.
    Anderson, J., Berc, L., Dean, J., Ghemawat, S., Henzinger, M., Leung, S., Sites, D., Vandevoorde, M., Waldspurger, C., Weihl, W.: Continuous profiling: Where have all the cycles gone? In: Proceedings of the 30th Symposium on Microarchitecture (MICRO-30), (1997)Google Scholar
  4. 4.
    Arcuri, A., White, D.R., Clark, J., Yao, X.: Multi-objective improvement of software using co-evolution and smart seeding. In: Proceedings of the 7th International Conference on Simulated Evolution And Learning (SEAL’08) (2008)Google Scholar
  5. 5.
    Arnold, M., Welc, A., Rajan, V.T.: Improving virtual machine performance using a cross-run profile repository. In: Proceedings of the ACM Conference on Object-Oriented Programming, Systems, Languages and Applications (OOPSLA’05) (2005)Google Scholar
  6. 6.
    Barthou, D., Donadio, S., Carribault, P., Duchateau, A., Jalby, W.: Loop optimization using hierarchical compilation and kernel decomposition. In: Proceedings of the International Symposium on Code Generation and Optimization (CGO) (2007)Google Scholar
  7. 7.
    Bodin, F., Kisuki, T., Knijnenburg, P., O’Boyle, M., Rohou, E.: Iterative compilation in a non-linear optimisation space. In: Proceedings of the Workshop on Profile and Feedback Directed Compilation (1998)Google Scholar
  8. 8.
    Bonilla, E.V., Williams, C.K.I., Agakov, F.V., Cavazos, J., Thomson, J., O’Boyle, M.F.P.: Predictive search distributions. In: Proceedings of the 23rd International Conference on Machine Learning. pp. 121–128, New York, NY, USA, (2006)Google Scholar
  9. 9.
    Brewer E. High-level optimization via automated statistical modeling. In: Proceedings of the 5th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pp. 80–91, (1995)Google Scholar
  10. 10.
    Calder, B., Grunwald, D., Jones, M., Lindsay, D., Martin, J., Mozer, M., Zorn, B.: Evidence-based static branch prediction using machine learning. ACM Transactions on Programming Languages and Systems (TOPLAS) (1997)Google Scholar
  11. 11.
    Cavazos, J., Fursin, G., Agakov, F., Bonilla, E., O’Boyle, M., Temam, O.: Rapidly selecting good compiler optimizations using performance counters. In: Proceedings of the International Symposium on Code Generation and Optimization (CGO) March (2007)Google Scholar
  12. 12.
    Cavazos J., Moss J. Inducing heuristics to decide whether to schedule. In: Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI) (2004)Google Scholar
  13. 13.
    CCC: Continuous Collective Compilation Framework for iterative multi-objective optimization. http://cTuning.org/ccc
  14. 14.
    COD: Public collaborative repository and tools for program and architecture characterization and optimization. http://cTuning.org/cdatabase
  15. 15.
    Chen, Y., Huang, Y., Eeckhout, L., Fursin, G., Peng, L., Temam, O., Wu, C.: Evaluating iterative optimization across 1000 data sets. In: Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI) June (2010)Google Scholar
  16. 16.
    Cooper, K., Grosul, A., Harvey, T., Reeves, S., Subramanian, D., Torczon, L., Waterman, T.: ACME: adaptive compilation made efficient. In: Proceedings of the Conference on Languages, Compilers, and Tools for Embedded Systems (LCTES) (2005)Google Scholar
  17. 17.
    Cooper, K., Schielke, P., Subramanian, D.: Optimizing for reduced code space using genetic algorithms. In: Proceedings of the Conference on Languages, Compilers, and Tools for Embedded Systems (LCTES), pp. 1–9, (1999)Google Scholar
  18. 18.
    Cooper, K., Subramanian, D., Torczon, L.: Adaptive optimizing compilers for the 21st century. J. Supercomput. 23(1) (2002)Google Scholar
  19. 19.
    cTuning CC: cTuning Compiler Collection that can convert any traditional compiler into adaptive, machine learning enabled self-tuning infrastructure using Milepost GCC with ICI, CCC framework, cBench, COD public repository and cTuning.org web-services. http://cTuning.org/ctuning-cc
  20. 20.
    cTuning.org: public collaborative optimization center with open source tools and repository to systematize, simplify and automate design and optimization of computing systems while enabling reproducibility of resultsGoogle Scholar
  21. 21.
    Donadio, S., Brodman, J.C., Roeder, T., Yotov, K., Barthou, D., Cohen, A., Garzaran, M.J., Padua, D.A., Pingali, K.: A language for the compact representation of multiple program versions. In: Proceedings of the International Workshop on Languages and Compilers for Parallel computing (LCPC) (2005)Google Scholar
  22. 22.
    Dubach, C., Jones, T.M., Bonilla, E.V., Fursin, G., O’Boyle, M.F.: Portable compiler optimization across embedded programs and microarchitectures using machine learning. In: Proceedings of the IEEE/ACM International Symposium on Microarchitecture (MICRO) December (2009)Google Scholar
  23. 23.
    Duda R., Hart P., Stork D.: Pattern Classification. Wiley, New-York (2001)MATHGoogle Scholar
  24. 24.
    El-Yaniv R., Pechyony D., Yom-Tov E.: Better multiclass classification via a margin-optimized single binary problem. Pattern Recognit Lett 29(14), 1954–1959 (2008)CrossRefGoogle Scholar
  25. 25.
    ESTO: Expert System for Tuning Optimizations. http://www.haifa.ibm.com/projects/systems/cot/esto/index.html
  26. 26.
    Franke, B., O’Boyle, M., Thomson, J., Fursin, G.: Probabilistic source-level optimisation of embedded programs. In: Proceedings of the Conference on Languages, Compilers, and Tools for Embedded Systems (LCTES) (2005)Google Scholar
  27. 27.
    Fursin, G.: Iterative Compilation and Performance Prediction for Numerical Applications. PhD thesis, University of Edinburgh, United Kingdom (2004)Google Scholar
  28. 28.
    Fursin, G.: Collective tuning initiative: automating and accelerating development and optimization of computing systems. In: Proceedings of the GCC Developers’ Summit, June (2009)Google Scholar
  29. 29.
    Fursin, G., Cavazos, J., O’Boyle, M., Temam, O.: MiDataSets: creating the conditions for a more realistic evaluation of iterative optimization. In: Proceedings of the International Conference on High Performance Embedded Architectures & Compilers (HiPEAC 2007), January (2007)Google Scholar
  30. 30.
    Fursin, G., Cohen, A., O’Boyle, M., Temam, O.: A practical method for quickly evaluating program optimizations. In: Proceedings of the International Conference on High Performance Embedded Architectures & Compilers (HiPEAC 2005), pp. 29–46, November (2005)Google Scholar
  31. 31.
    Fursin G., O’Boyle M., Temam O., Watts G.: Fast and accurate method for determining a lower bound on execution time. Concurrency 16(2–3), 271–292 (2004)CrossRefGoogle Scholar
  32. 32.
    Fursin, G., Temam, O.: Collective optimization. In: Proceedings of the International Conference on High Performance Embedded Architectures & Compilers (HiPEAC 2009), January (2009)Google Scholar
  33. 33.
    Georges, A., Buytaert, D., Eeckhout, L.: Statistically rigorous java performance evaluation. In: Proceedings of the Twenty-Second ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages & Applications (OOPSLA) (2007)Google Scholar
  34. 34.
    GCC: the GNU Compiler Collection. http://gcc.gnu.org
  35. 35.
    GRID5000: A nationwide infrastructure for large scale parallel and distributed computing research. http://www.grid5000.fr
  36. 36.
    Guthaus, M.R., Ringenberg, M.R., Ernst, D., Austin, T.M., Mudge, T., Brown, R.B.: Mibench: A free, commercially representative embedded benchmark suite. In: Proceedings of the IEEE 4th Annual Workshop on Workload Characterization, Austin, TX, December (2001)Google Scholar
  37. 37.
    Heydemann, K., Bodin, F.: Iterative compilation for two antagonistic criteria: Application to code size and performance. In: Proceedings of the 4th Workshop on Optimizations for DSP and Embedded Systems, colocated with CGO (2006)Google Scholar
  38. 38.
    Hoste, K., Eeckhout, L.: Cole: Compiler optimization level exploration. In: Proceedings of the International Symposium on Code Generation and Optimization (CGO) (2008)Google Scholar
  39. 39.
    Hoste, K., Eeckhout, L.: Comparing benchmarks using key microarchitecture-independent characteristics. In: Proceedings of the IEEE International Symposium on Workload Characterization (IISWC), pp. 83–92, California,USA, October (2006)Google Scholar
  40. 40.
    Huang, Y., Peng, L., Wu, C., Kashnikov, Y., Renneke, J., Fursin, G.: Transforming GCC into a research-friendly environment: plugins for optimization tuning and reordering, function cloning and program instrumentation. In: 2nd International Workshop on GCC Research Opportunities (GROW), Colocated with HiPEAC’10 Conference, January (2010)Google Scholar
  41. 41.
    ICI: Interactive Compilation Interface is a unified plugin system to convert black-box production compilers into interactive research toolsets for application and architecture characterization and optimization. http://cTuning.org/ici
  42. 42.
    Ipek, E., McKee, S.A., de Supinski, B.R., Schulz, M., Caruana, R.: Efficiently exploring architectural design spaces via predictive modeling. In: Proceedings of the 12th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), pp. 195–206 (2006)Google Scholar
  43. 43.
    Jimenez, V., Gelado, I., Vilanova, L., Gil, M., Fursin, G., Navarro, N.: Predictive runtime code scheduling for heterogeneous architectures. In: Proceedings of the International Conference on High Performance Embedded Architectures & Compilers (HiPEAC 2009), January (2009)Google Scholar
  44. 44.
    Kisuki, T., Knijnenburg, P., O’Boyle, M.: Combined selection of tile sizes and unroll factors using iterative compilation. In: Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (PACT), pp. 237–246, (2000)Google Scholar
  45. 45.
    Kisuki T., Knijnenburg P., O’Boyle M.: Combined selection of tile sizes and unroll factors using iterative compilation. In: Proceedings of IEEE International Conference on Parallel Architectures and Compilation Techniques (PACT), pp. 237–246, (2000)Google Scholar
  46. 46.
    Kulkarni, P., Zhao, W., Moon, H., Cho, K., Whalley, D., Davidson, J., Bailey, M., Paek, Y., Gallivan, K.: Finding effective optimization phase sequences. In: Proceedings of the Conference on Languages, Compilers, and Tools for Embedded Systems (LCTES), pp. 12–23 (2003)Google Scholar
  47. 47.
    Larra naga P., Lozano J.A.: Estimation of Distribution Algorithms: A New Tool for Evolutionary Computation. Kluwer, Norwell (2001)Google Scholar
  48. 48.
    Lattner, C., Adve, V.: LLVM: A compilation framework for lifelong program analysis & transformation. In: Proceedings of the 2004 International Symposium on Code Generation and Optimization (CGO’04), Palo Alto, California, March (2004)Google Scholar
  49. 49.
    Lau, J., Arnold, M., Hind, M., Calder, B.: Online performance auditing: Using hot optimizations without getting burned. In: Proceedings of the ACM SIGPLAN Conference on Programming Languaged Design and Implementation (PLDI’06) (2006)Google Scholar
  50. 50.
    Li, X., Garzaran, M.J., Padua, D.A.: Optimizing sorting with machine learning algorithms. In: Proceedings of the International Parallel and Distributed Processing Symposium (IPDPS) (2007)Google Scholar
  51. 51.
    Long, S., Fursin, G.: A heuristic search algorithm based on unified transformation framework. In: Proceedings of the 7th International Workshop on High Performance Scientific and Engineering Computing (HPSEC-05), pp. 137–144, (2005)Google Scholar
  52. 52.
    Long, S., Fursin, G., Franke, B.: A cost-aware parallel workload allocation approach based on machine learning techniques. In: Proceedings of the IFIP International Conference on Network and Parallel Computing (NPC 2007), number 4672 in LNCS, pp. 506–515. Springer, September (2007)Google Scholar
  53. 53.
    LLVM: the low level virtual machine compiler infrastructure. http://llvm.org
  54. 54.
    Lu, J., Chen, H., Yew, P.-C., Hsu, W.-C.: Design and implementation of a lightweight dynamic optimization system. J. Instruction-Level Parallel. 6 (2004)Google Scholar
  55. 55.
    Luo, L., Chen, Y., Wu, C., Long, S., Fursin, G.: Finding representative sets of optimizations for adaptive multiversioning applications. In: 3rd Workshop on Statistical and Machine Learning Approaches Applied to Architectures and Compilation (SMART’09), Colocated with HiPEAC’09 Conference, January (2009)Google Scholar
  56. 56.
    Matteo, F., Johnson, S.: FFTW: An adaptive software architecture for the FFT. In: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 3, pp. 1381–1384, Seattle, WA, May (1998)Google Scholar
  57. 57.
    MILEPOST GCC: public collaborative R&D website. http://cTuning.org/milepost-gcc
  58. 58.
    MILEPOST project archive (MachIne Learning for Embedded PrOgramS opTimization). http://cTuning.org/project-milepost
  59. 59.
    McGovern, A., Moss, E.: Scheduling straight-line code using reinforcement learning and rollouts. In: Advances in Neural Information Processing Systems (NIPS). Morgan Kaufmann, San Mateo (1998)Google Scholar
  60. 60.
    Monsifrot, A., Bodin, F., Quiniou, R.: A machine learning approach to automatic production of compiler heuristics. In: Proceedings of the International Conference on Artificial Intelligence: Methodology, Systems, Applications, LNCS 2443, pp. 41–50 (2002)Google Scholar
  61. 61.
    Monsifrot, A., Bodin, F., Quiniou, R.: A machine learning approach to automatic production of compiler heuristics. In: Proceedings of the Tenth International Conference on Artificial Intelligence: Methodology, Systems, Applications (AIMSA), LNCS 2443, pp. 41–50, (2002)Google Scholar
  62. 62.
    Moss, J., Utgoff, P., Cavazos, J., Precup, D., Stefanovic, D., Brodley, C., Scheeff, D.: Learning to schedule straight-line code. In: Advances in Neural Information Processing Systems (NIPS), pp. 929–935. Morgan Kaufmann, (1997)Google Scholar
  63. 63.
    Namolaru, M., Cohen, A., Fursin, G., Zaks, A., Freund, A.: Practical aggregation of semantical program properties for machine learning based optimization. In: Proceedings of the International Conference on Compilers, Architecture, and Synthesis For Embedded Systems (CASES 2010), October (2010)Google Scholar
  64. 64.
    Nisbet, A.: Iterative feedback directed parallelisation using genetic algorithms. In: Proceedings of the Workshop on Profile and Feedback Directed Compilation in Conjunction with International Conference on Parallel Architectures and Compilation Technique (PACT) (1998)Google Scholar
  65. 65.
    Open64: an open source optimizing compiler suite. http://www.open64.net
  66. 66.
    OProfile: system-wide profiler for Linux systems, capable of profiling all running code at low overhead. http://oprofile.sourceforge.net
  67. 67.
    Pan, Z., Eigenmann, R.: Fast and effective orchestration of compiler optimizations for automatic performance tuning. In: Proceedings of the International Symposium on Code Generation and Optimization (CGO), pp. 319–332 (2006)Google Scholar
  68. 68.
    PathScale EKOPath Compilers. http://www.pathscale.com
  69. 69.
    Phoenix: software optimization and analysis framework for microsoft compiler technologies. https://connect.microsoft.com/Phoenix
  70. 70.
    ROSE: an open source compiler infrastructure to build source-to-source program transformation and analysis tools. http://www.rosecompiler.org/
  71. 71.
    Singer, B., Veloso, M.: Learning to predict performance from formula modeling and training data. In: Proceedings of the Conference on Machine Learning (2000)Google Scholar
  72. 72.
    Stephenson, M., Amarasinghe, S.: Predicting unroll factors using supervised classification. In: Proceedings of International Symposium on Code Generation and Optimization (CGO), pp. 123–134, (2005)Google Scholar
  73. 73.
    Stephenson, M., Amarasinghe, S., Martin, M., O’Reilly, U.-M.: Meta optimization: Improving compiler heuristics with machine learning. In: Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’03), pp. 77–90, June (2003)Google Scholar
  74. 74.
    Stephenson, M.W.: Automating the construction of compiler heuristics using machine learning. PhD thesis, MIT, USA, (2006)Google Scholar
  75. 75.
    Touati, S., Worms, J., Briais, S.: The speedup test. In: INRIA Technical Report HAL-inria-00443839 (2010)Google Scholar
  76. 76.
    Tournavitis, G., Wang, Z., Franke, B., O’Boyle, M.F.: Towards a holistic approach to auto-parallelization: Integrating profile-driven parallelism detection and machine-learning based mapping. In: Proceedings of the Conference on Programming Language Design and Implementation (PLDI) (2009)Google Scholar
  77. 77.
    Triantafyllis, S., Vachharajani, M., Vachharajani, N., August, D.: Compiler optimization-space exploration. In: Proceedings of the International Symposium on Code Generation and Optimization (CGO), pp. 204–215 (2003)Google Scholar
  78. 78.
    Trifunovic, K., Cohen, A., Edelsohn, D., Feng, L., Grosser, T., Jagasia, H., Ladelsky, R., Pop, S., Sjoedin, J., Upadrasta, R.: Graphite two years after: First lessons learned from real-world polyhedral compilation. In: 2nd International Workshop on GCC Research Opportunities (GROW) (2010)Google Scholar
  79. 79.
    Ullman J.: Principles of database and knowledge systems, vol. 1. Computer Science Press, New York (1988)Google Scholar
  80. 80.
    Voss, M., Eigenmann, R.: ADAPT: Automated de-coupled adaptive program transformation. In: Proceedings of International Conference on Parallel Processing (2000)Google Scholar
  81. 81.
    Vuduc R., Demmel J.W., Yelick K.A.: OSKI: A library of automatically tuned sparse matrix kernels. J. Phys. Conf. Ser. 16, 521–530 (2005)CrossRefGoogle Scholar
  82. 82.
    wei Liao, S., han Hung, T., Nguyen, D., Chou, C., Tu, C., Zhou, H.: Machine learning-based prefetch optimization for data center applications. In: Proceedings of the IEEE/ACM Conference on Supercomputing (SC) (2009)Google Scholar
  83. 83.
    Whaley, J., Lam, M.S.: Cloning based context sensitive pointer alias analysis using binary decision diagrams. In: Proceedings of the Conference on Programming Language Design and Implementation (PLDI), (2004)Google Scholar
  84. 84.
    Whaley, R., Dongarra, J.: Automatically tuned linear algebra software. In: Proceedings of the Conference on High Performance Networking and Computing (1998)Google Scholar
  85. 85.
    Williams, S., Oliker, L., Vuduc, R., Shalf, J., Yelick, K., Demmel, J.: Optimization of sparse matrix-vector multiplication on emerging multicore platforms. In: Proceedings of the IEEE/ACM Conference on Supercomputing (SC) (2007)Google Scholar
  86. 86.
    Yi, Q., Seymour, K., You, H., Vuduc, R., Quinlan, D.: Poet: Parameterized optimizations for empirical tuning. In: Proceedings of the Workshop on Performance Optimization of High-level Languages and Libraries (POHLL) Co-located with IEEE International Parallel and Distributed Processing Symposium (IPDPS) (2007)Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2011

Authors and Affiliations

  • Grigori Fursin
    • 1
    • 2
  • Yuriy Kashnikov
    • 2
  • Abdul Wahid Memon
    • 2
  • Zbigniew Chamski
    • 1
  • Olivier Temam
    • 1
  • Mircea Namolaru
    • 3
  • Elad Yom-Tov
    • 3
  • Bilha Mendelson
    • 3
  • Ayal Zaks
    • 3
  • Eric Courtois
    • 4
  • Francois Bodin
    • 4
  • Phil Barnard
    • 5
  • Elton Ashton
    • 5
  • Edwin Bonilla
    • 6
  • John Thomson
    • 6
  • Christopher K. I. Williams
    • 6
  • Michael O’Boyle
    • 6
  1. 1.INRIA Saclay, Parc Club Orsay UniversiteOrsayFrance
  2. 2.University of Versailles Saint Quentin en YvelinesVersaillesFrance
  3. 3.IBM Research LabHaifa University CampusHaifaIsrael
  4. 4.CAPS EntrepriseRennesFrance
  5. 5.ARC InternationalSt. AlbansUK
  6. 6.University of Edinburgh, Informatics ForumEdinburghUK

Personalised recommendations