Advertisement

Compiler Optimizations: Machine Learning versus O3

  • Yuriy Kashnikov
  • Jean Christophe Beyler
  • William Jalby
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7760)

Abstract

Software engineers are highly dependent on compiler technology to create efficient programs. Optimal execution time is currently the most important criteria in the HPC field; to achieve this the user applies the common compiler option -O3. The following paper extensively tests the other performance options available and concludes that, although old compiler versions could benefit from compiler flag combinations, modern compilers perform admirably at the commonly used -O3 level.

The paper presents the Universal Learning Machine (ULM) framework, which combines different tools together to predict the best flags from data gathered offline. The ULM framework evaluates three hundred kernels extracted from 144 benchmark applications. It automatically processes more than ten thousand compiler flag combinations for each kernel. In order to perform a complete study, the experimental setup includes three modern mainstream compilers and four different architectures. For 62% of kernels, the optimal flag is the generic optimization level -O3.

For the remaining 38% of kernels, an extension to the ULM framework allows a user to instantly obtain the optimal flag combination, using a static prediction method. The prediction method examines four known machine learning algorithms, Nearest Neighbor, Stochastic Gradient Descent, and Support Vector Machines (SVM). ULM used SVM for the best results of a 92% accuracy rate for the considered kernels.

Keywords

compilers optimization machine learning performance modeling high performance computing 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Cavazos, J., Fursin, G., Agakov, F., Bonilla, E., O’Boyle, M.F.P., Temam, O.: Rapidly selecting good compiler optimizations using performance counters. In: Proceedings of the International Symposium on Code Generation and Optimization, CGO 2007, pp. 185–197. IEEE Computer Society, Washington, DC (2007)Google Scholar
  2. 2.
    Agakov, F., Bonilla, E., Cavazos, J., Franke, B., Fursin, G., O’Boyle, M.F.P., Thomson, J., Toussaint, M., Williams, C.K.I.: Using machine learning to focus iterative optimization. In: Proceedings of the International Symposium on Code Generation and Optimization, CGO 2006, pp. 295–305. IEEE Computer Society, Washington, DC (2006)Google Scholar
  3. 3.
    Pan, Z., Eigenmann, R.: Fast and effective orchestration of compiler optimizations for automatic performance tuning. In: Proceedings of the International Symposium on Code Generation and Optimization, CGO 2006, pp. 319–332. IEEE Computer Society, Washington, DC (2006)Google Scholar
  4. 4.
    Hoste, K., Eeckhout, L.: Cole: compiler optimization level exploration. In: Proceedings of the 6th Annual IEEE/ACM International Symposium on Code Generation and Optimization, CGO 2008, pp. 165–174. ACM, New York (2008)CrossRefGoogle Scholar
  5. 5.
    Fursin, G., Kashnikov, Y., Wahid, A., Chamski, M.Z., Temam, O., Namolaru, M., Yom-tov, E., Mendelson, B., Zaks, A., Courtois, E., Bodin, F., Barnard, P., Ashton, E., Bonilla, E., Thomson, J., Williams, C.K.I.: Milepost GCC: machine learning enabled self-tuning compiler (2011)Google Scholar
  6. 6.
    Bodin, F., Kisuki, T., Knijnenburg, P., O’Boyle, M., Rohou, E.: Iterative compilation in a non-linear optimisation space (1998)Google Scholar
  7. 7.
    Sanchez, R.N., Amaral, J.N., Szafron, D., Pirvu, M., Stoodley, M.: Using support vector machines to learn how to compile a method. In: Proceedings of the 2010 22nd International Symposium on Computer Architecture and High Performance Computing, SBAC-PAD 2010, pp. 223–230. IEEE Computer Society, Washington, DC (2010)CrossRefGoogle Scholar
  8. 8.
    Bishop, C.M.: Pattern recognition and machine learning, 1st edn., corr. 2nd printing edn. Springer (October 2006)Google Scholar
  9. 9.
    Dubach, C., Cavazos, J., Franke, B., Fursin, G., O’Boyle, M.F., Temam, O.: Fast compiler optimisation evaluation using code-feature based performance prediction. In: Proceedings of the 4th International Conference on Computing Frontiers, CF 2007, pp. 131–142. ACM, New York (2007)Google Scholar
  10. 10.
    Namolaru, M., Cohen, A., Fursin, G., Zaks, A., Freund, A.: Practical aggregation of semantical program properties for machine learning based optimization. In: Proceedings of the 2010 International Conference on Compilers, Architectures and Synthesis for Embedded Systems, CASES 2010, pp. 197–206. ACM, New York (2010)CrossRefGoogle Scholar
  11. 11.
    Azimi, R., Stumm, M., Wisniewski, R.W.: Online performance analysis by statistical sampling of microprocessor performance counters. In: Proceedings of the 19th Annual International Conference on Supercomputing, ICS 2005, pp. 101–110. ACM, New York (2005)CrossRefGoogle Scholar
  12. 12.
    Treibig, J., Hager, G., Wellein, G.: LIKWID: Lightweight Performance Tools. CoRR abs/1104.4874 (2011)Google Scholar
  13. 13.
    Intel Corp.: Intel 64 and IA-32 Architectures Optimization Reference Manual (2011)Google Scholar
  14. 14.
    Intel Corp.: Intel 64 and IA-32 Architectures Software Developer’s Manual (2011)Google Scholar
  15. 15.
    Petit, E., Papaure, G., Dru, F., Bodin, F.: ASTEX: a Hot path Based Thread Extractor for Distributed Memory System on a Chip. In: 1st HiPEAC Industrial Workshop, Grenoble, France (2006)Google Scholar
  16. 16.
    Cooper, K.D., Schielke, P.J., Subramanian, D.: Optimizing for reduced code space using genetic algorithms. SIGPLAN Not. 34, 1–9 (1999)CrossRefGoogle Scholar
  17. 17.
    Triantafyllis, S., Vachharajani, M., Vachharajani, N., August, D.I.: Compiler optimization-space exploration. In: Proceedings of the International Symposium on Code Generation and Optimization: Feedback-Directed and Runtime Optimization, CGO 2003, pp. 204–215. IEEE Computer Society, Washington, DC (2003)CrossRefGoogle Scholar
  18. 18.
    Parello, D., Temam, O., Cohen, A., Verdun, J.M.: Towards a systematic, pragmatic and architecture-aware program optimization process for complex processors. In: Proceedings of the 2004 ACM/IEEE Conference on Supercomputing, p. 15. IEEE Computer Society, Washington, DC (2004)Google Scholar
  19. 19.
    Zhao, M., Childers, B.R., Soffa, M.L.: A model-based framework: An approach for profit-driven optimization. In: Proceedings of the International Symposium on Code Generation and Optimization, CGO 2005, pp. 317–327. IEEE Computer Society, Washington, DC (2005)CrossRefGoogle Scholar
  20. 20.
    Vuduc, R., Demmel, J.W., Bilmes, J.A.: Statistical models for empirical search-based performance tuning. Int. J. High Perform. Comput. Appl. 18, 65–94 (2004)CrossRefGoogle Scholar
  21. 21.
    Stephenson, M., Amarasinghe, S.: Predicting unroll factors using supervised classification. In: Proceedings of the International Symposium on Code Generation and Optimization, CGO 2005, pp. 123–134. IEEE Computer Society, Washington, DC (2005)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Yuriy Kashnikov
    • 1
    • 3
  • Jean Christophe Beyler
    • 2
    • 3
  • William Jalby
    • 1
    • 3
  1. 1.Université de Versailles Saint-Quentin-en-YvelinesFrance
  2. 2.Intel FranceFrance
  3. 3.Exascale Computing Research CenterFrance

Personalised recommendations