Quick and Practical Run-Time Evaluation of Multiple Program Optimizations

Fursin, Grigori; Cohen, Albert; O’Boyle, Michael; Temam, Olivier

doi:10.1007/978-3-540-71528-3_4

Grigori Fursin^17,18,
Albert Cohen¹⁷,
Michael O’Boyle¹⁸ &
…
Olivier Temam¹⁷

Part of the book series: Lecture Notes in Computer Science ((THIPEAC,volume 4050))

573 Accesses
12 Citations

Abstract

This article aims at making iterative optimization practical and usable by speeding up the evaluation of a large range of optimizations. Instead of using a full run to evaluate a single program optimization, we take advantage of periods of stable performance, called phases. For that purpose, we propose a low-overhead phase detection scheme geared toward fast optimization space pruning, using code instrumentation and versioning implemented in a production compiler.

Our approach is driven by simplicity and practicality. We show that a simple phase detection scheme can be sufficient for optimization space pruning. We also show it is possible to search for complex optimizations at run-time without resorting to sophisticated dynamic compilation frameworks. Beyond iterative optimization, our approach also enables one to quickly design self-tuned applications.

Considering 5 representative SpecFP2000 benchmarks, our approach speeds up iterative search for the best program optimizations by a factor of 32 to 962. Phase prediction is 99.4% accurate on average, with an overhead of only 2.6%. The resulting self-tuned implementations bring an average speed-up of 1.4.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Almagor, L., Cooper, K.D., Grosul, A., Harvey, T., Reeves, S., Subramanian, D., Torczon, L., Waterman, T.: Finding effective compilation sequences. In: Proc. Languages, Compilers, and Tools for Embedded Systems (LCTES), pp. 231–239 (2004)
Google Scholar
Auslander, J., Philipose, M., Chambers, C., Eggers, S.J., Bershad, B.N.: Fast, effective dynamic compilation. In: Conference on Programming Language Design and Implementation (PLDI), pp. 149–159 (1996)
Google Scholar
Franke, J.T.B., O’Boyle, M., Fursin, G.: Probabilistic source-level optimisation of embedded systems software. In: ACM SIGPLAN/SIGBED Conference on Languages, Compilers, and Tools for Embedded Systems (LCTES’05), ACM Press, New York (2005)
Google Scholar
Bala, V., Duesterwald, E., Banerjia, S.: Dynamo: A transparent dynamic optimization system. In: ACM SIGPLAN Notices, ACM Press, New York (2000)
Google Scholar
Beckmann, O., Houghton, A., Kelly, P.H.J., Mellor, M.: Run-time code generation in c++ as a foundation for domain-specific optimisation. In: Proceedings of the 2003 Dagstuhl Workshop on Domain-Specific Program Generation (2003)
Google Scholar
Bilmes, J., Asanović, K., Chin, C., Demmel, J.: Optimizing matrix multiply using PHiPAC: A portable, high-performance, ANSI C coding methodology. In: Proc. ICS, pp. 340–347 (1997)
Google Scholar
Bodin, F., Kisuki, T., Knijnenburg, P., O’Boyle, M., Rohou, E.: Iterative compilation in a non-linear optimisation space (Organized in conjunction with PACT98). In: ACM Workshop on Profile and Feedback Directed Compilation, ACM, New York (1998)
Google Scholar
Browne, S., Dongarra, J., Garner, N., Ho, G., Mucci, P.: A portable programming interface for performance evaluation on modern processors. The International Journal of High Performance Computing Applications 14(3), 189–204 (2000)
Article Google Scholar
Byler, M., Wolfe, M., Davies, J.R.B., Huson, C., Leasure, B.: Multiple version loops. In: ICPP 1987, pp. 312–318 (2005)
Google Scholar
Calcagno, C., Taha, W., Huang, L., Leroy, X.: Implementing multi-stage languages using ASTs, Gensym, and reflection. In: Pfenning, F., Smaragdakis, Y. (eds.) GPCE 2003. LNCS, vol. 2830, pp. 57–76. Springer, Heidelberg (2003)
Chapter Google Scholar
Chen, H., Lu, J., Hsu, W.-C., Yew, P.-C.: Continuous adaptive object-code re-optimization framework. In: Yew, P.-C., Xue, J. (eds.) ACSAC 2004. LNCS, vol. 3189, pp. 241–255. Springer, Heidelberg (2004)
Chapter Google Scholar
Cohen, A., Donadio, S., Garzaran, M.-J., Herrmann, C., Padua, D.: In search of a program generator to implement generic transformations for high-performance computing. In: Science of Computer Programming (to appear, 2006)
Google Scholar
Cohen, A., Girbal, S., Parello, D., Sigler, M., Temam, O., Vasilache, N.: Facilitating the search for compositions of program transformations. In: ACM Int. Conf on Supercomputing (ICS’05), June, ACM Press, New York (2005)
Google Scholar
Cooper, K.D., Hall, M.W., Kennedy, K.: Procedure cloning. In: Proceedings of the 1992 IEEE International Conference on Computer Language, pp. 96–105. IEEE Computer Society Press, Los Alamitos (1992)
Chapter Google Scholar
Cooper, K.D., Kennedy, K., Torczon, L.: The impact of interprocedural analysis and optimization in the Rⁿ programming environment. ACM Transactions on Programming Languages and Systems 8, 491–523 (1986)
Article Google Scholar
Cooper, K.D., Schielke, P., Subramanian, D.: Optimizing for reduced code space using genetic algorithms. In: Proc. Languages, Compilers, and Tools for Embedded Systems (LCTES), pp. 1–9 (1999)
Google Scholar
Cooper, K.D., Subramanian, D., Torczon, L.: Adaptive optimizing compilers for the 21st century. J. of Supercomputing 23(1) (2002)
Google Scholar
Diniz, P., Rinard, M.: Dynamic feedback: An effective technique for adaptive computing. In: Proc. PLDI, pp. 71–84 (1997)
Google Scholar
Duesterwald, E., Cascaval, C., Dwarkadas, S.: Characterizing and predicting program behavior and its variability. In: Malyshkin, V. (ed.) PaCT 2003. LNCS, vol. 2763, pp. 220–231. Springer, Heidelberg (2003)
Google Scholar
Engler, D.: Vcode: a portable, very fast dynamic code generation system. In: Proceedings of PLDI (1996)
Google Scholar
Fursin, G., O’Boyle, M., Knijnenburg, P.: Evaluating iterative compilation. In: Proc. Languages and Compilers for Parallel Computers (LCPC), pp. 305–315 (2002)
Google Scholar
Heydeman, K., Bodin, F., Knijnenburg, P., Morin, L.: UFC: a global trade-off strategy for loop unrolling for VLIW architectures. In: Proc. Compilers for Parallel Computers (CPC), pp. 59–70 (2003)
Google Scholar
Hu, S., Valluri, M., John, L.K.: Effective adaptive computing environment management via dynamic optimization. In: IEEE / ACM International Symposium on Code Generation and Optimization (CGO 2005), IEEE Computer Society Press, Los Alamitos (2005)
Google Scholar
Kim, J., Kodakara, S.V., Hsu, W.-C., Lilja, D.J., Yew, P.-C.: Dynamic code region (DCR)-based program phase tracking and prediction for dynamic optimizations. In: Conte, T., Navarro, N., Hwu, W.-m.W., Valero, M., Ungerer, T. (eds.) HiPEAC 2005. LNCS, vol. 3793, Springer, Heidelberg (2005)
Google Scholar
Kisuki, T., Knijnenburg, P., O’Boyle, M., Wijshoff, H.: Iterative compilation in program optimization. In: Proc. Compilers for Parallel Computers (CPC2000), pp. 35–44 (2000)
Google Scholar
Lau, J., Schoenmackers, S., Calder, B.: Transition phase classification and prediction. In: International Symposium on High Performance Computer Architecture (2005)
Google Scholar
Long, S., Fursin, G.: A heuristic search algorithm based on unified transformation framework. In: 7th International Workshop on High Performance Scientific and Engineering Computing (HPSEC-05) (2005)
Google Scholar
Lu, J., Chen, H., Yew, P.-C., Hsu, W.-C.: Design and implementation of a lightweight dynamic optimization system. The Journal of Instruction-Level Parallelism 6 (2004)
Google Scholar
Monsifrot, A., Bodin, F., Quiniou, R.: A machine learning approach to automatic production of compiler heuristics. In: Scott, D. (ed.) AIMSA 2002. LNCS (LNAI), vol. 2443, pp. 41–50. Springer, Heidelberg (2002)
Chapter Google Scholar
PAPI: A Portable Interface to Hardware Performance Counters (2005) http://icl.cs.utk.edu/papi
Parello, D., Temam, O., Cohen, A., Verdun, J.-M.: Toward a systematic, pragmatic and architecture-aware program optimization process for complex processors. In: Proc. Int. Conference on Supercomputing (2004)
Google Scholar
PathScale EKOPath Compilers (2005), http://www.pathscale.com
Perelman, E., Hamerly, G., Biesbrouck, M.V., Sherwood, T., Calder, B.: Using simpoint for accurate and efficient simulation. In: ACM SIGMETRICS the International Conference on Measurement and Modeling of Computer Systems, ACM Press, New York (2003)
Google Scholar
Poletto, M., Hsieh, W.C., Engler, D.R., Kaashoek, M.F.: ‘C and tcc: A language and compiler for dynamic code generation. ACM Trans. Prog. Lang. Syst. 21(2), 324–369 (1999)
Article Google Scholar
Saavedra, R.H., Park, D.: Improving the effectiveness of software prefetching with adaptive execution. In: Conference on Parallel Architectures and Compilation Techniques (PACT’96) (1996)
Google Scholar
Shen, X., Zhong, Y., Ding, C.: Locality phase prediction. In: ACM SIGARCH Computer Architecture News, pp. 165–176. ACM Press, New York (2004)
Google Scholar
Sherwood, T., Perelman, E., Hamerly, G., Calder, B.: Automatically characterizing large scale program behavior. In: 10th International Conference on Architectural Support for Programming Languages and Operating Systems (2002)
Google Scholar
Sherwood, T., Perelman, E., Hamerly, G., Calder, B.: Automatically characterizing large scale program behavior. In: Proceedings of ASPLOS-X (2002)
Google Scholar
Stephenson, M., Amarasinghe, S.: Predicting unroll factors using supervised classification. In: IEEE / ACM International Symposium on Code Generation and Optimization (CGO 2005), IEEE Computer Society Press, Los Alamitos (2005)
Google Scholar
Stephenson, M., Martin, M., O’Reilly, U.: Meta optimization: Improving compiler heuristics with machine learning. In: Proc. PLDI, pp. 77–90 (2003)
Google Scholar
Taha, W.: Multi-Stage Programming: Its Theory and Applications. PhD thesis, Oregon Graduate Institute of Science and Technology (Nov. 1999)
Google Scholar
Triantafyllis, S., Vachharajani, M., August, D.I.: Compiler optimization-space exploration. Journal of Instruction-level Parallelism (2005)
Google Scholar
Vera, X., Abella, J., González, A., Llosa, J.: Optimizing program locality through CMEs and GAs. In: Malyshkin, V. (ed.) PaCT 2003. LNCS, vol. 2763, pp. 68–78. Springer, Heidelberg (2003)
Google Scholar
Voss, M., Eigemann, R.: High-level adaptive program optimization with adapt. In: Proceedings of the Symposium on Principles and practices of parallel programming (2001)
Google Scholar
Voss, M., Eigenmann, R.: Adapt: Automated de-coupled adaptive program transformation. In: Proc. ICPP (2000)
Google Scholar
Whaley, R.C., Dongarra, J.J.: Automatically tuned linear algebra software. In: Proc. Alliance (1998)
Google Scholar

Download references

Author information

Authors and Affiliations

Members of HiPEAC, ALCHEMY Group, INRIA Futurs and LRI, Paris-Sud University, France
Grigori Fursin, Albert Cohen & Olivier Temam
Member of HiPEAC, Institute for Computing Systems Architecture, University of Edinburgh, UK
Grigori Fursin & Michael O’Boyle

Authors

Grigori Fursin
View author publications
You can also search for this author in PubMed Google Scholar
Albert Cohen
View author publications
You can also search for this author in PubMed Google Scholar
Michael O’Boyle
View author publications
You can also search for this author in PubMed Google Scholar
Olivier Temam
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science and Engineering, Chalmers University of Technology, 412 96, Gothenburg, Sweden
Per Stenström

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Fursin, G., Cohen, A., O’Boyle, M., Temam, O. (2007). Quick and Practical Run-Time Evaluation of Multiple Program Optimizations. In: Stenström, P. (eds) Transactions on High-Performance Embedded Architectures and Compilers I. Lecture Notes in Computer Science, vol 4050. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-71528-3_4

Download citation

DOI: https://doi.org/10.1007/978-3-540-71528-3_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-71527-6
Online ISBN: 978-3-540-71528-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics