The Journal of Supercomputing

, Volume 36, Issue 2, pp 183–196 | Cite as

Automatic tuning of whole applications using direct search and a performance-based transformation system

  • Apan Qasem
  • Ken Kennedy
  • John Mellor-Crummey


In many cases, simple analytical models used by traditional compilers are no longer able to yield effectively optimized code for complex programs because of the enormous complexity of processor architectures. A promising alternative approach for optimizing applications effectively has been the use of search-based empirical methods. The success of empirically tuned library generators such as ATLAS has shown that this strategy can be effective for domain-specific programs. However, to date there has been no general-purpose tool for effective empirical optimization of whole programs. The main obstacle to this approach has been the need for evaluating a prohibitively large number of alternative program variants. To address this problem, we have developed a prototype tool for automatic application tuning that uses loop-level performance feedback and a direct search strategy to guide search for the best set of optimization parameters. Experiments on four different architectures show that direct search can be an effective technique for finding good values for transformation parameters in a reasonable time.


Direct Search Transformation System Main Obstacle Transformation Parameter Performance Feedback 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Bilmes J, Asanovic K, Chen C-W, Demmel J (1997) Optimizing matrix multiply using phipac: a portable high-performance ansi-c coding methodology. In Proceedings of the 1997 ACM International Conference on Supercomputing, Vienna, AustriaGoogle Scholar
  2. 2.
    Carr S, Kennedy K (1994) Improving the ratio of memory operations to floating-point operations in loops. ACM Transactions on Programming Languages and Systems 16(6):1768–1810CrossRefGoogle Scholar
  3. 3.
    Cooper K, Subramanian D, Torczon L (2001) Adapative optimizing compilers for the 21st century. In Proceedings of the Los Alamos Computer Science Institute Second Annual Symposium, Santa Fe, NMGoogle Scholar
  4. 4.
    Frigo M (1998) A fast fourier transform compiler. In Proceedings of the SIGPLAN ’98 Conference on Programming Language Design and Implementation, Montreal, CanadaGoogle Scholar
  5. 5.
    Fursin GG, O’Boyle MFP, Knijnenburg PMW (2002) Evaluating iterative compilation. In Proceedings of the Fifteenth International Workshop on Languages and Compilers for Parallel Computing, College Park, MarylandGoogle Scholar
  6. 6.
    Hooke R, Jeeves TA (1961) Direct search solution of numerical and statistical problems. In Journal of the ACM pp 212–229Google Scholar
  7. 7.
    Kisuki T, Knijnenburg P (2003) Combined selection of tile sizes and unroll factors using iterative compilation. The Journal of Supercomputing 24(1):43–67CrossRefGoogle Scholar
  8. 8.
    Knijnenburg P, Kisuki T, Boyle MO (2002) Iterative compilation. In Embedded Processor Design Challenges–-System Architecture, Modeling and Simulation (SAMOS), Lecture Notes in Computer Science 2268, Springer Verlag pp 171–187,Google Scholar
  9. 9.
    Kulkarni P, Hines S, Hiser J, Whalley D, Davidson J, Jones D (2004) Fast searches for effective optimization phase sequences. In Proceedings of the SIGPLAN ’04 Conference on Programming Language Design and Implementation. Washington, DCGoogle Scholar
  10. 10.
    Lewis RM, Torczon V, Trosset MW (2000) Direct search methods: then and now. Journal of Computational and Applied Mathematics 124(1–2):191–207CrossRefMathSciNetGoogle Scholar
  11. 11.
    Mellor-Crummey J, Fowler R, Marin G, Tallent N (2002) HPCView: a tool for top-down analysis of node performance. Special Issue with selected papers from the Los Alamos Computer Science Institute Symposium, (In press)Google Scholar
  12. 12.
    Qasem A, Jin G, Mellor-Crummey J (2003) Improving performance with integrated program transformations. Technical Report CS-TR03-419, Dept. of Computer Science Rice UniversityGoogle Scholar
  13. 13.
    Torczon V (1989) Multi-directional search: A Direct Search Algorithm for Parallel Machines. PhD thesis, Dept. of Computer Science, Rice UniversityGoogle Scholar
  14. 14.
    Triantafyllis S, Vachharajani M, Vachharajani N, August D (2003) Compiler optimization-space exploration. In Proceedings of the International Symposium on Code Generation and Optimization: Feedback-Directed and Runtime Optimization. San Fransisco, CAGoogle Scholar
  15. 15.
    Whaley C, Dongarra J (1998) Automatically tuned linear algebra software. In Proceedings of SC’98: High Performance Networking and Computing. Orlando, FLGoogle Scholar
  16. 16.
    Wolf M, Maydan D, Chen D (1996) Combining loop transformations considering caches and scheduling. In Proceedings of the 29th Annual International Symposium on MicroArchitecture. pp 274–286Google Scholar
  17. 17.
    Wolfe MJ (1987) Iteration space tiling for memory hierarchies. Extended version of a paper which appeared in proceedings of the Third SIAM Conference on Parallel ProcessingGoogle Scholar
  18. 18.
    Xiong J, Johnson J, Johnson R, Padua D (2001) SPL: A Language and Compiler for DSP algorithms. In Proceedings of the SIGPLAN ’01 Conference on Programming Language Design and Implementation Snowbird, UtahGoogle Scholar
  19. 19.
    Yotov K, Li X, Ren G, Cibulskis M, DeJong G, Garzaran M, Padua D, Pingali K, Stodghill P, Wu P (2003) A comparison of empirical and model-driven optimization. In Proceedings of the SIGPLAN ’03 Conference on Programming Language Design and Implementation. San Diego, CAGoogle Scholar

Copyright information

© Springer Science + Business Media, LLC 2006

Authors and Affiliations

  1. 1.Department of Computer ScienceRice UniversityHouston

Personalised recommendations