Statistical Models for Automatic Performance Tuning
 Richard Vuduc,
 James W. Demmel,
 Jeff Bilmes
 … show all 3 hide
Abstract
Achieving peak performance from library subroutines usually requires extensive, machinedependent tuning by hand. Automatic tuning systems have emerged in response, and they typically operate, at compiletime, by (1) generating a large number of possible implementations of a subroutine, and (2) selecting a fast implementation by an exhaustive, empirical search. This paper applies statistical techniques to exploit the large amount of performance data collected during the search. First, we develop a heuristic for stopping an exhaustive compiletime search early if a nearoptimal implementation is found. Second, we show how to construct runtime decision rules, based on runtime inputs, for selecting from among a subset of the best implementations. We apply our methods to actual performance data collected by the PHiPAC tuning system for matrix multiply on a variety of hardware platforms.
 J. Bilmes, K. Asanović, C. Chin, and J. Demmel. Optimizing matrix multiply using PHiPAC: a Portable, HighPerformance, ANSI C coding methodology. In Proc. of the Int’l Conf. on Supercomputing, Vienna, Austria, July 1997.
 Bilmes, J., Asanović, K., Demmel, J., Lam, D., Chin, C. (1998) The PHiPAC v1.0 matrixmultiply distribution. University of California, Berkeley
 Birnbaum, Z. W. (1952) Numerical tabulation of the distribution of Kolmogorov’s statistic for finite sample size. J. Am. Stat. Assoc. 47: pp. 425441 CrossRef
 E. Brewer. Highlevel optimization via automated statistical modeling. In Sym. Par. Alg. Arch., Santa Barbara, California, July 1995.
 Dongarra, J., Croz, J. D., Duff, I., Hammarling, S. (1990) A set of level 3 basic linear algebra subprograms. ACM Trans. Math. Soft. 16: pp. 117 CrossRef
 M. Frigo and S. Johnson. FFTW: An adaptive software architecture for the FFT. In Proc. of the Int’l Conf. on Acoustics, Speech, and Signal Processing, May 1998.
 G. Haentjens. An investigation of recursive FFT implementations. Master’s thesis, Carnegie Mellon University, 2000.
 E.J. Im and K. Yelick. Optimizing sparse matrix vector multiplication on SMPs. In Proc. of the 9th SIAM Conf. on Parallel Processing for Sci. Comp., March 1999.
 M. I. Jordan. Why the logistic function? Technical Report 9503, MIT, 1995.
 T. Kisuki, P. M. Knijnenburg, M. F. O’Boyle, and H. Wijshoff. Iterative compilation in program optimization. In Proceedings of the 8th International Workshop on Compilers for Parallel Computers, pages 35–44, 2000.
 Lawson, C., Hanson, R., Kincaid, D., Krogh, F. (1979) Basic linear algebra subprograms for Fortran usage. ACM Trans. Math. Soft. 5: pp. 308323 CrossRef
 D. A. Schwartz, R. R. Judd, W. J. Harrod, and D. P. Manley. VSIPL 1.0 API, March 2000. http://www.vsipl.org.
 B. Singer and M. Veloso. Learning to predict performance from formula modeling and training data. In Proc. of the 17th Int’l Conf. on Mach. Learn., 2000.
 S. S. Vadhiyar, G. E. Fagg, and J. Dongarra. Automatically tuned collective operations. In Proceedings of Supercomputing 2000, November 2000.
 V. N. Vapnik. Statistical Learning Theory. John Wiley and Sons, Inc., 1998.
 R. Vuduc, J. Demmel, and J. Bilmes. Statistical modeling of feedback data in an automatic tuning system. In MICRO33: Third ACM Workshop on FeedbackDirected Dynamic Optimization, December 2000.
 C. Whaley and J. Dongarra. Automatically tuned linear algebra software. In Proc. of Supercomp., 1998.
 Title
 Statistical Models for Automatic Performance Tuning
 Book Title
 Computational Science — ICCS 2001
 Book Subtitle
 International Conference San Francisco, CA, USA, May 28–30, 2001 Proceedings, Part I
 Pages
 pp 117126
 Copyright
 2001
 DOI
 10.1007/3540455450_21
 Print ISBN
 9783540422327
 Online ISBN
 9783540455455
 Series Title
 Lecture Notes in Computer Science
 Series Volume
 2073
 Series ISSN
 03029743
 Publisher
 Springer Berlin Heidelberg
 Copyright Holder
 SpringerVerlag Berlin Heidelberg
 Additional Links
 Topics
 Industry Sectors
 eBook Packages
 Editors

 Vassil N. Alexandrov ^{(1)}
 Jack J. Dongarra ^{(2)}
 Benjoe A. Juliano ^{(3)}
 René S. Renner ^{(3)}
 C. J. Kenneth Tan ^{(4)}
 Editor Affiliations

 1. School of Computer Science, Cybernetics and Electronic Engineering, University of Reading
 2. Innovative Computing Lab, Computer Science Department, University of Tennessee
 3. Computer Science Department, California State University
 4. School of Computer Science, The Queen’s University of Belfast
 Authors

 Richard Vuduc ^{(5)}
 James W. Demmel ^{(6)}
 Jeff Bilmes ^{(7)}
 Author Affiliations

 5. Computer Science Division, University of California at Berkeley, Berkeley, CA, 94720, USA
 6. Computer Science Division and Dept. of Mathematics, University of California at Berkeley, Berkeley, CA, 94720, USA
 7. Dept. of Electrical Engineering, University of Washington, Seattle, WA, USA
Continue reading...
To view the rest of this content please follow the download PDF link above.