Statistical Models for Automatic Performance Tuning
Achieving peak performance from library subroutines usually requires extensive, machine-dependent tuning by hand. Automatic tuning systems have emerged in response, and they typically operate, at compile-time, by (1) generating a large number of possible implementations of a subroutine, and (2) selecting a fast implementation by an exhaustive, empirical search. This paper applies statistical techniques to exploit the large amount of performance data collected during the search. First, we develop a heuristic for stopping an exhaustive compile-time search early if a near-optimal implementation is found. Second, we show how to construct run-time decision rules, based on run-time inputs, for selecting from among a subset of the best implementations. We apply our methods to actual performance data collected by the PHiPAC tuning system for matrix multiply on a variety of hardware platforms.
Unable to display preview. Download preview PDF.
- 1.J. Bilmes, K. Asanović, C. Chin, and J. Demmel. Optimizing matrix multiply using PHiPAC: a Portable, High-Performance, ANSI C coding methodology. In Proc. of the Int’l Conf. on Supercomputing, Vienna, Austria, July 1997.Google Scholar
- 2.J. Bilmes, K. Asanović, J. Demmel, D. Lam, and C. Chin. The PHiPAC v1.0 matrix-multiply distribution. Technical Report UCB/CSD-98-1020, University of California, Berkeley, October 1998.Google Scholar
- 4.E. Brewer. High-level optimization via automated statistical modeling. In Sym. Par. Alg. Arch., Santa Barbara, California, July 1995.Google Scholar
- 6.M. Frigo and S. Johnson. FFTW: An adaptive software architecture for the FFT. In Proc. of the Int’l Conf. on Acoustics, Speech, and Signal Processing, May 1998.Google Scholar
- 7.G. Haentjens. An investigation of recursive FFT implementations. Master’s thesis, Carnegie Mellon University, 2000.Google Scholar
- 8.E.-J. Im and K. Yelick. Optimizing sparse matrix vector multiplication on SMPs. In Proc. of the 9th SIAM Conf. on Parallel Processing for Sci. Comp., March 1999.Google Scholar
- 9.M. I. Jordan. Why the logistic function? Technical Report 9503, MIT, 1995.Google Scholar
- 10.T. Kisuki, P. M. Knijnenburg, M. F. O’Boyle, and H. Wijshoff. Iterative compilation in program optimization. In Proceedings of the 8th International Workshop on Compilers for Parallel Computers, pages 35–44, 2000.Google Scholar
- 12.D. A. Schwartz, R. R. Judd, W. J. Harrod, and D. P. Manley. VSIPL 1.0 API, March 2000. http://www.vsipl.org.
- 13.B. Singer and M. Veloso. Learning to predict performance from formula modeling and training data. In Proc. of the 17th Int’l Conf. on Mach. Learn., 2000.Google Scholar
- 14.S. S. Vadhiyar, G. E. Fagg, and J. Dongarra. Automatically tuned collective operations. In Proceedings of Supercomputing 2000, November 2000.Google Scholar
- 15.V. N. Vapnik. Statistical Learning Theory. John Wiley and Sons, Inc., 1998.Google Scholar
- 16.R. Vuduc, J. Demmel, and J. Bilmes. Statistical modeling of feedback data in an automatic tuning system. In MICRO-33: Third ACM Workshop on Feedback-Directed Dynamic Optimization, December 2000.Google Scholar
- 17.C. Whaley and J. Dongarra. Automatically tuned linear algebra software. In Proc. of Supercomp., 1998.Google Scholar