Statistical Models for Automatic Performance Tuning

  • Richard Vuduc
  • James W. Demmel
  • Jeff Bilmes
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2073)


Achieving peak performance from library subroutines usually requires extensive, machine-dependent tuning by hand. Automatic tuning systems have emerged in response, and they typically operate, at compile-time, by (1) generating a large number of possible implementations of a subroutine, and (2) selecting a fast implementation by an exhaustive, empirical search. This paper applies statistical techniques to exploit the large amount of performance data collected during the search. First, we develop a heuristic for stopping an exhaustive compile-time search early if a near-optimal implementation is found. Second, we show how to construct run-time decision rules, based on run-time inputs, for selecting from among a subset of the best implementations. We apply our methods to actual performance data collected by the PHiPAC tuning system for matrix multiply on a variety of hardware platforms.


Message Passing Interface Tile Size Good Implementation Tuning System Computer Science Division 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    J. Bilmes, K. Asanović, C. Chin, and J. Demmel. Optimizing matrix multiply using PHiPAC: a Portable, High-Performance, ANSI C coding methodology. In Proc. of the Int’l Conf. on Supercomputing, Vienna, Austria, July 1997.Google Scholar
  2. 2.
    J. Bilmes, K. Asanović, J. Demmel, D. Lam, and C. Chin. The PHiPAC v1.0 matrix-multiply distribution. Technical Report UCB/CSD-98-1020, University of California, Berkeley, October 1998.Google Scholar
  3. 3.
    Z. W. Birnbaum. Numerical tabulation of the distribution of Kolmogorov’s statistic for finite sample size. J. Am. Stat. Assoc., 47:425–441, September 1952.zbMATHCrossRefMathSciNetGoogle Scholar
  4. 4.
    E. Brewer. High-level optimization via automated statistical modeling. In Sym. Par. Alg. Arch., Santa Barbara, California, July 1995.Google Scholar
  5. 5.
    J. Dongarra, J. D. Croz, I. Duff, and S. Hammarling. A set of level 3 basic linear algebra subprograms. ACM Trans. Math. Soft., 16(1):1–17, March 1990.zbMATHCrossRefGoogle Scholar
  6. 6.
    M. Frigo and S. Johnson. FFTW: An adaptive software architecture for the FFT. In Proc. of the Int’l Conf. on Acoustics, Speech, and Signal Processing, May 1998.Google Scholar
  7. 7.
    G. Haentjens. An investigation of recursive FFT implementations. Master’s thesis, Carnegie Mellon University, 2000.Google Scholar
  8. 8.
    E.-J. Im and K. Yelick. Optimizing sparse matrix vector multiplication on SMPs. In Proc. of the 9th SIAM Conf. on Parallel Processing for Sci. Comp., March 1999.Google Scholar
  9. 9.
    M. I. Jordan. Why the logistic function? Technical Report 9503, MIT, 1995.Google Scholar
  10. 10.
    T. Kisuki, P. M. Knijnenburg, M. F. O’Boyle, and H. Wijshoff. Iterative compilation in program optimization. In Proceedings of the 8th International Workshop on Compilers for Parallel Computers, pages 35–44, 2000.Google Scholar
  11. 11.
    C. Lawson, R. Hanson, D. Kincaid, and F. Krogh. Basic linear algebra subprograms for Fortran usage. ACM Trans. Math. Soft., 5:308–323, 1979.zbMATHCrossRefGoogle Scholar
  12. 12.
    D. A. Schwartz, R. R. Judd, W. J. Harrod, and D. P. Manley. VSIPL 1.0 API, March 2000.
  13. 13.
    B. Singer and M. Veloso. Learning to predict performance from formula modeling and training data. In Proc. of the 17th Int’l Conf. on Mach. Learn., 2000.Google Scholar
  14. 14.
    S. S. Vadhiyar, G. E. Fagg, and J. Dongarra. Automatically tuned collective operations. In Proceedings of Supercomputing 2000, November 2000.Google Scholar
  15. 15.
    V. N. Vapnik. Statistical Learning Theory. John Wiley and Sons, Inc., 1998.Google Scholar
  16. 16.
    R. Vuduc, J. Demmel, and J. Bilmes. Statistical modeling of feedback data in an automatic tuning system. In MICRO-33: Third ACM Workshop on Feedback-Directed Dynamic Optimization, December 2000.Google Scholar
  17. 17.
    C. Whaley and J. Dongarra. Automatically tuned linear algebra software. In Proc. of Supercomp., 1998.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2001

Authors and Affiliations

  • Richard Vuduc
    • 1
  • James W. Demmel
    • 2
  • Jeff Bilmes
    • 3
  1. 1.Computer Science DivisionUniversity of California at BerkeleyBerkeleyUSA
  2. 2.Computer Science Division and Dept. of MathematicsUniversity of California at BerkeleyBerkeleyUSA
  3. 3.Dept. of Electrical EngineeringUniversity of WashingtonSeattleUSA

Personalised recommendations