Abstract
Software engineers are highly dependent on compiler technology to create efficient programs. Optimal execution time is currently the most important criteria in the HPC field; to achieve this the user applies the common compiler option -O3. The following paper extensively tests the other performance options available and concludes that, although old compiler versions could benefit from compiler flag combinations, modern compilers perform admirably at the commonly used -O3 level.
The paper presents the Universal Learning Machine (ULM) framework, which combines different tools together to predict the best flags from data gathered offline. The ULM framework evaluates three hundred kernels extracted from 144 benchmark applications. It automatically processes more than ten thousand compiler flag combinations for each kernel. In order to perform a complete study, the experimental setup includes three modern mainstream compilers and four different architectures. For 62% of kernels, the optimal flag is the generic optimization level -O3.
For the remaining 38% of kernels, an extension to the ULM framework allows a user to instantly obtain the optimal flag combination, using a static prediction method. The prediction method examines four known machine learning algorithms, Nearest Neighbor, Stochastic Gradient Descent, and Support Vector Machines (SVM). ULM used SVM for the best results of a 92% accuracy rate for the considered kernels.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Cavazos, J., Fursin, G., Agakov, F., Bonilla, E., O’Boyle, M.F.P., Temam, O.: Rapidly selecting good compiler optimizations using performance counters. In: Proceedings of the International Symposium on Code Generation and Optimization, CGO 2007, pp. 185–197. IEEE Computer Society, Washington, DC (2007)
Agakov, F., Bonilla, E., Cavazos, J., Franke, B., Fursin, G., O’Boyle, M.F.P., Thomson, J., Toussaint, M., Williams, C.K.I.: Using machine learning to focus iterative optimization. In: Proceedings of the International Symposium on Code Generation and Optimization, CGO 2006, pp. 295–305. IEEE Computer Society, Washington, DC (2006)
Pan, Z., Eigenmann, R.: Fast and effective orchestration of compiler optimizations for automatic performance tuning. In: Proceedings of the International Symposium on Code Generation and Optimization, CGO 2006, pp. 319–332. IEEE Computer Society, Washington, DC (2006)
Hoste, K., Eeckhout, L.: Cole: compiler optimization level exploration. In: Proceedings of the 6th Annual IEEE/ACM International Symposium on Code Generation and Optimization, CGO 2008, pp. 165–174. ACM, New York (2008)
Fursin, G., Kashnikov, Y., Wahid, A., Chamski, M.Z., Temam, O., Namolaru, M., Yom-tov, E., Mendelson, B., Zaks, A., Courtois, E., Bodin, F., Barnard, P., Ashton, E., Bonilla, E., Thomson, J., Williams, C.K.I.: Milepost GCC: machine learning enabled self-tuning compiler (2011)
Bodin, F., Kisuki, T., Knijnenburg, P., O’Boyle, M., Rohou, E.: Iterative compilation in a non-linear optimisation space (1998)
Sanchez, R.N., Amaral, J.N., Szafron, D., Pirvu, M., Stoodley, M.: Using support vector machines to learn how to compile a method. In: Proceedings of the 2010 22nd International Symposium on Computer Architecture and High Performance Computing, SBAC-PAD 2010, pp. 223–230. IEEE Computer Society, Washington, DC (2010)
Bishop, C.M.: Pattern recognition and machine learning, 1st edn., corr. 2nd printing edn. Springer (October 2006)
Dubach, C., Cavazos, J., Franke, B., Fursin, G., O’Boyle, M.F., Temam, O.: Fast compiler optimisation evaluation using code-feature based performance prediction. In: Proceedings of the 4th International Conference on Computing Frontiers, CF 2007, pp. 131–142. ACM, New York (2007)
Namolaru, M., Cohen, A., Fursin, G., Zaks, A., Freund, A.: Practical aggregation of semantical program properties for machine learning based optimization. In: Proceedings of the 2010 International Conference on Compilers, Architectures and Synthesis for Embedded Systems, CASES 2010, pp. 197–206. ACM, New York (2010)
Azimi, R., Stumm, M., Wisniewski, R.W.: Online performance analysis by statistical sampling of microprocessor performance counters. In: Proceedings of the 19th Annual International Conference on Supercomputing, ICS 2005, pp. 101–110. ACM, New York (2005)
Treibig, J., Hager, G., Wellein, G.: LIKWID: Lightweight Performance Tools. CoRR abs/1104.4874 (2011)
Intel Corp.: Intel 64 and IA-32 Architectures Optimization Reference Manual (2011)
Intel Corp.: Intel 64 and IA-32 Architectures Software Developer’s Manual (2011)
Petit, E., Papaure, G., Dru, F., Bodin, F.: ASTEX: a Hot path Based Thread Extractor for Distributed Memory System on a Chip. In: 1st HiPEAC Industrial Workshop, Grenoble, France (2006)
Cooper, K.D., Schielke, P.J., Subramanian, D.: Optimizing for reduced code space using genetic algorithms. SIGPLAN Not. 34, 1–9 (1999)
Triantafyllis, S., Vachharajani, M., Vachharajani, N., August, D.I.: Compiler optimization-space exploration. In: Proceedings of the International Symposium on Code Generation and Optimization: Feedback-Directed and Runtime Optimization, CGO 2003, pp. 204–215. IEEE Computer Society, Washington, DC (2003)
Parello, D., Temam, O., Cohen, A., Verdun, J.M.: Towards a systematic, pragmatic and architecture-aware program optimization process for complex processors. In: Proceedings of the 2004 ACM/IEEE Conference on Supercomputing, p. 15. IEEE Computer Society, Washington, DC (2004)
Zhao, M., Childers, B.R., Soffa, M.L.: A model-based framework: An approach for profit-driven optimization. In: Proceedings of the International Symposium on Code Generation and Optimization, CGO 2005, pp. 317–327. IEEE Computer Society, Washington, DC (2005)
Vuduc, R., Demmel, J.W., Bilmes, J.A.: Statistical models for empirical search-based performance tuning. Int. J. High Perform. Comput. Appl. 18, 65–94 (2004)
Stephenson, M., Amarasinghe, S.: Predicting unroll factors using supervised classification. In: Proceedings of the International Symposium on Code Generation and Optimization, CGO 2005, pp. 123–134. IEEE Computer Society, Washington, DC (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kashnikov, Y., Beyler, J.C., Jalby, W. (2013). Compiler Optimizations: Machine Learning versus O3. In: Kasahara, H., Kimura, K. (eds) Languages and Compilers for Parallel Computing. LCPC 2012. Lecture Notes in Computer Science, vol 7760. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-37658-0_3
Download citation
DOI: https://doi.org/10.1007/978-3-642-37658-0_3
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-37657-3
Online ISBN: 978-3-642-37658-0
eBook Packages: Computer ScienceComputer Science (R0)