Skip to main content

Quick and Practical Run-Time Evaluation of Multiple Program Optimizations

  • Conference paper
Book cover Transactions on High-Performance Embedded Architectures and Compilers I

Part of the book series: Lecture Notes in Computer Science ((THIPEAC,volume 4050))

Abstract

This article aims at making iterative optimization practical and usable by speeding up the evaluation of a large range of optimizations. Instead of using a full run to evaluate a single program optimization, we take advantage of periods of stable performance, called phases. For that purpose, we propose a low-overhead phase detection scheme geared toward fast optimization space pruning, using code instrumentation and versioning implemented in a production compiler.

Our approach is driven by simplicity and practicality. We show that a simple phase detection scheme can be sufficient for optimization space pruning. We also show it is possible to search for complex optimizations at run-time without resorting to sophisticated dynamic compilation frameworks. Beyond iterative optimization, our approach also enables one to quickly design self-tuned applications.

Considering 5 representative SpecFP2000 benchmarks, our approach speeds up iterative search for the best program optimizations by a factor of 32 to 962. Phase prediction is 99.4% accurate on average, with an overhead of only 2.6%. The resulting self-tuned implementations bring an average speed-up of 1.4.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Almagor, L., Cooper, K.D., Grosul, A., Harvey, T., Reeves, S., Subramanian, D., Torczon, L., Waterman, T.: Finding effective compilation sequences. In: Proc. Languages, Compilers, and Tools for Embedded Systems (LCTES), pp. 231–239 (2004)

    Google Scholar 

  2. Auslander, J., Philipose, M., Chambers, C., Eggers, S.J., Bershad, B.N.: Fast, effective dynamic compilation. In: Conference on Programming Language Design and Implementation (PLDI), pp. 149–159 (1996)

    Google Scholar 

  3. Franke, J.T.B., O’Boyle, M., Fursin, G.: Probabilistic source-level optimisation of embedded systems software. In: ACM SIGPLAN/SIGBED Conference on Languages, Compilers, and Tools for Embedded Systems (LCTES’05), ACM Press, New York (2005)

    Google Scholar 

  4. Bala, V., Duesterwald, E., Banerjia, S.: Dynamo: A transparent dynamic optimization system. In: ACM SIGPLAN Notices, ACM Press, New York (2000)

    Google Scholar 

  5. Beckmann, O., Houghton, A., Kelly, P.H.J., Mellor, M.: Run-time code generation in c++ as a foundation for domain-specific optimisation. In: Proceedings of the 2003 Dagstuhl Workshop on Domain-Specific Program Generation (2003)

    Google Scholar 

  6. Bilmes, J., Asanović, K., Chin, C., Demmel, J.: Optimizing matrix multiply using PHiPAC: A portable, high-performance, ANSI C coding methodology. In: Proc. ICS, pp. 340–347 (1997)

    Google Scholar 

  7. Bodin, F., Kisuki, T., Knijnenburg, P., O’Boyle, M., Rohou, E.: Iterative compilation in a non-linear optimisation space (Organized in conjunction with PACT98). In: ACM Workshop on Profile and Feedback Directed Compilation, ACM, New York (1998)

    Google Scholar 

  8. Browne, S., Dongarra, J., Garner, N., Ho, G., Mucci, P.: A portable programming interface for performance evaluation on modern processors. The International Journal of High Performance Computing Applications 14(3), 189–204 (2000)

    Article  Google Scholar 

  9. Byler, M., Wolfe, M., Davies, J.R.B., Huson, C., Leasure, B.: Multiple version loops. In: ICPP 1987, pp. 312–318 (2005)

    Google Scholar 

  10. Calcagno, C., Taha, W., Huang, L., Leroy, X.: Implementing multi-stage languages using ASTs, Gensym, and reflection. In: Pfenning, F., Smaragdakis, Y. (eds.) GPCE 2003. LNCS, vol. 2830, pp. 57–76. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  11. Chen, H., Lu, J., Hsu, W.-C., Yew, P.-C.: Continuous adaptive object-code re-optimization framework. In: Yew, P.-C., Xue, J. (eds.) ACSAC 2004. LNCS, vol. 3189, pp. 241–255. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  12. Cohen, A., Donadio, S., Garzaran, M.-J., Herrmann, C., Padua, D.: In search of a program generator to implement generic transformations for high-performance computing. In: Science of Computer Programming (to appear, 2006)

    Google Scholar 

  13. Cohen, A., Girbal, S., Parello, D., Sigler, M., Temam, O., Vasilache, N.: Facilitating the search for compositions of program transformations. In: ACM Int. Conf on Supercomputing (ICS’05), June, ACM Press, New York (2005)

    Google Scholar 

  14. Cooper, K.D., Hall, M.W., Kennedy, K.: Procedure cloning. In: Proceedings of the 1992 IEEE International Conference on Computer Language, pp. 96–105. IEEE Computer Society Press, Los Alamitos (1992)

    Chapter  Google Scholar 

  15. Cooper, K.D., Kennedy, K., Torczon, L.: The impact of interprocedural analysis and optimization in the Rn programming environment. ACM Transactions on Programming Languages and Systems 8, 491–523 (1986)

    Article  Google Scholar 

  16. Cooper, K.D., Schielke, P., Subramanian, D.: Optimizing for reduced code space using genetic algorithms. In: Proc. Languages, Compilers, and Tools for Embedded Systems (LCTES), pp. 1–9 (1999)

    Google Scholar 

  17. Cooper, K.D., Subramanian, D., Torczon, L.: Adaptive optimizing compilers for the 21st century. J. of Supercomputing 23(1) (2002)

    Google Scholar 

  18. Diniz, P., Rinard, M.: Dynamic feedback: An effective technique for adaptive computing. In: Proc. PLDI, pp. 71–84 (1997)

    Google Scholar 

  19. Duesterwald, E., Cascaval, C., Dwarkadas, S.: Characterizing and predicting program behavior and its variability. In: Malyshkin, V. (ed.) PaCT 2003. LNCS, vol. 2763, pp. 220–231. Springer, Heidelberg (2003)

    Google Scholar 

  20. Engler, D.: Vcode: a portable, very fast dynamic code generation system. In: Proceedings of PLDI (1996)

    Google Scholar 

  21. Fursin, G., O’Boyle, M., Knijnenburg, P.: Evaluating iterative compilation. In: Proc. Languages and Compilers for Parallel Computers (LCPC), pp. 305–315 (2002)

    Google Scholar 

  22. Heydeman, K., Bodin, F., Knijnenburg, P., Morin, L.: UFC: a global trade-off strategy for loop unrolling for VLIW architectures. In: Proc. Compilers for Parallel Computers (CPC), pp. 59–70 (2003)

    Google Scholar 

  23. Hu, S., Valluri, M., John, L.K.: Effective adaptive computing environment management via dynamic optimization. In: IEEE / ACM International Symposium on Code Generation and Optimization (CGO 2005), IEEE Computer Society Press, Los Alamitos (2005)

    Google Scholar 

  24. Kim, J., Kodakara, S.V., Hsu, W.-C., Lilja, D.J., Yew, P.-C.: Dynamic code region (DCR)-based program phase tracking and prediction for dynamic optimizations. In: Conte, T., Navarro, N., Hwu, W.-m.W., Valero, M., Ungerer, T. (eds.) HiPEAC 2005. LNCS, vol. 3793, Springer, Heidelberg (2005)

    Google Scholar 

  25. Kisuki, T., Knijnenburg, P., O’Boyle, M., Wijshoff, H.: Iterative compilation in program optimization. In: Proc. Compilers for Parallel Computers (CPC2000), pp. 35–44 (2000)

    Google Scholar 

  26. Lau, J., Schoenmackers, S., Calder, B.: Transition phase classification and prediction. In: International Symposium on High Performance Computer Architecture (2005)

    Google Scholar 

  27. Long, S., Fursin, G.: A heuristic search algorithm based on unified transformation framework. In: 7th International Workshop on High Performance Scientific and Engineering Computing (HPSEC-05) (2005)

    Google Scholar 

  28. Lu, J., Chen, H., Yew, P.-C., Hsu, W.-C.: Design and implementation of a lightweight dynamic optimization system. The Journal of Instruction-Level Parallelism 6 (2004)

    Google Scholar 

  29. Monsifrot, A., Bodin, F., Quiniou, R.: A machine learning approach to automatic production of compiler heuristics. In: Scott, D. (ed.) AIMSA 2002. LNCS (LNAI), vol. 2443, pp. 41–50. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  30. PAPI: A Portable Interface to Hardware Performance Counters (2005) http://icl.cs.utk.edu/papi

  31. Parello, D., Temam, O., Cohen, A., Verdun, J.-M.: Toward a systematic, pragmatic and architecture-aware program optimization process for complex processors. In: Proc. Int. Conference on Supercomputing (2004)

    Google Scholar 

  32. PathScale EKOPath Compilers (2005), http://www.pathscale.com

  33. Perelman, E., Hamerly, G., Biesbrouck, M.V., Sherwood, T., Calder, B.: Using simpoint for accurate and efficient simulation. In: ACM SIGMETRICS the International Conference on Measurement and Modeling of Computer Systems, ACM Press, New York (2003)

    Google Scholar 

  34. Poletto, M., Hsieh, W.C., Engler, D.R., Kaashoek, M.F.: ‘C and tcc: A language and compiler for dynamic code generation. ACM Trans. Prog. Lang. Syst. 21(2), 324–369 (1999)

    Article  Google Scholar 

  35. Saavedra, R.H., Park, D.: Improving the effectiveness of software prefetching with adaptive execution. In: Conference on Parallel Architectures and Compilation Techniques (PACT’96) (1996)

    Google Scholar 

  36. Shen, X., Zhong, Y., Ding, C.: Locality phase prediction. In: ACM SIGARCH Computer Architecture News, pp. 165–176. ACM Press, New York (2004)

    Google Scholar 

  37. Sherwood, T., Perelman, E., Hamerly, G., Calder, B.: Automatically characterizing large scale program behavior. In: 10th International Conference on Architectural Support for Programming Languages and Operating Systems (2002)

    Google Scholar 

  38. Sherwood, T., Perelman, E., Hamerly, G., Calder, B.: Automatically characterizing large scale program behavior. In: Proceedings of ASPLOS-X (2002)

    Google Scholar 

  39. Stephenson, M., Amarasinghe, S.: Predicting unroll factors using supervised classification. In: IEEE / ACM International Symposium on Code Generation and Optimization (CGO 2005), IEEE Computer Society Press, Los Alamitos (2005)

    Google Scholar 

  40. Stephenson, M., Martin, M., O’Reilly, U.: Meta optimization: Improving compiler heuristics with machine learning. In: Proc. PLDI, pp. 77–90 (2003)

    Google Scholar 

  41. Taha, W.: Multi-Stage Programming: Its Theory and Applications. PhD thesis, Oregon Graduate Institute of Science and Technology (Nov. 1999)

    Google Scholar 

  42. Triantafyllis, S., Vachharajani, M., August, D.I.: Compiler optimization-space exploration. Journal of Instruction-level Parallelism (2005)

    Google Scholar 

  43. Vera, X., Abella, J., González, A., Llosa, J.: Optimizing program locality through CMEs and GAs. In: Malyshkin, V. (ed.) PaCT 2003. LNCS, vol. 2763, pp. 68–78. Springer, Heidelberg (2003)

    Google Scholar 

  44. Voss, M., Eigemann, R.: High-level adaptive program optimization with adapt. In: Proceedings of the Symposium on Principles and practices of parallel programming (2001)

    Google Scholar 

  45. Voss, M., Eigenmann, R.: Adapt: Automated de-coupled adaptive program transformation. In: Proc. ICPP (2000)

    Google Scholar 

  46. Whaley, R.C., Dongarra, J.J.: Automatically tuned linear algebra software. In: Proc. Alliance (1998)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Fursin, G., Cohen, A., O’Boyle, M., Temam, O. (2007). Quick and Practical Run-Time Evaluation of Multiple Program Optimizations. In: Stenström, P. (eds) Transactions on High-Performance Embedded Architectures and Compilers I. Lecture Notes in Computer Science, vol 4050. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-71528-3_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-71528-3_4

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-71527-6

  • Online ISBN: 978-3-540-71528-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics