Abstract
In this paper we present results we obtained using a compiler to predict performance of scientific codes. The compiler, Polaris [3], is both the primary tool for estimating the performance of a range of codes, and the beneficiary of the results obtained from predicting the program behavior at compile time. We show that a simple compile-time model, augmented with profiling data obtained using very light instrumentation, can be accurate within 20% (on average) of the measured performance for codes using both dense and sparse computational methods.
This work is supported in part by Army contract DABT63-95-C-0097; Army contract N66001-97-C-8532; NSF contract MIP-9619351; and a Partnership Award from IBM. This work is not necessarily representative of the positions or policies of the Army or Government.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
T. Ball and J. R. Larus. Branch prediction for free. In Proceedings of the ACM SIGPLAN Conference on Programming Languages Design and Implementation’ 93, pages 300–313, 1993.
U. Banerjee. Dependence analysis. Kluwer Academic Publishers, 1997.
W. Blume, R. Doallo, R. Eigenmann, J. Grout, J. Hoeflinger, T. Lawrence, J. Lee, D. Padua, Y. Paek, W. Pottenger, L. Rauchwerger, and P. Tu. Parallel Programming with Polaris. IEEE Computer, December 1996.
R. Bramley, D. Gannon, T. Stuckey, J. Villacis, J. Balasubramanian, E. Akman, F. Breg, S. Diwan, and M. Govindaraju. The Linear System Analyzer, chapter PSEs. IEEE, 1998.
C. Cascaval and D. A. Padua. Compile-time cache misses estimation using stack distances. In preparation.
P. P. Chang, S. A. Mahlke, and W.-M. W. Hwu. Using profile information to assist classic compiler code optimizations. Software Practice and Experience, 21(12):1301–1321, December 1991.
R. P. Colwell, R. P. Nix, J. J. O’Donnell, D. B. Papworth, and P. K. Rodman. A VLIW architecture for a trace scheduling compiler. In Proceedings of ASPLOS II, pages 180–192, Palo Alto, CA, October 1987.
L. DeRose, Y. Zhang, and D. A. Reed. SvPablo: A multi-language performance analysis system. In 10th International Conference on Computer Performance Evaluation-Modelling Techniques and Tools-Performance Tools’98, pages 352–355, Palma de Mallorca, Spain, September 1998.
T. Fahringer. Evaluation of benchmark performance estimation for parallel Fortran programs on massively parallel SIMD and MIMD computers. In IEEE Proceedings of the 2nd Euromicro Workshop on Parallel and Distributed Processing, Malaga, Spain, January 1994.
T. Fahringer. Automatic Performance Prediction of Parallel Programs. Kluwer Academic Press, 1996.
T. Fahringer. Estimating cache performance for sequential and data parallel programs. Technical Report TR 97-9, Institute for Software Technology and Parallel Systems, Univ. of Vienna, Vienna, Austria, October 1997.
J. A. Fisher. Trace scheduling: A technique for global microcode compaction. IEEE Transactions on Computers, C(30):478–490, July 1981.
J. D. Gee, M. D. Hill, and A. J. Smith. Cache performance of the SPEC92 benchmark suite. In Proceedings of the IEEE Micro, pages 17–27, August 1993.
S. Ghosh, M. Martonosi, and S. Malik. Precise Miss Analysis for Program Transformations with Caches of Arbitrary Associativity. In Proceedings of ASPLOS VIII, San Jose, CA, October 1998.
M. D. Hill and A. J. Smith. Evaluating associativity in cpu caches. IEEE Transactions on Computers, 38(12):1612–1630, December 1989.
Y. Kang, M. Huang, S.-M. Yoo, Z. Ge, D. Keen, V. Lam, P. Pattnaik, and J. Torrellas. FlexRAM: Toward an advanced intelligent memory system. In International Conference on Computer Design (ICCD), October 1999.
R. L. Mattson, J. Gecsei, D. Slutz, and I. Traiger. Evaluation techniques for storage hierarchies. IBM Systems Journal, 9(2), 1970.
J. Reilly. SPEC95 Products and Benchmarks. SPEC Newsletter, September 1995.
R. Saavedra and A. Smith. Measuring cache and tlb performance and their effect on benchmark run times. IEEE Transactions on Computers, 44(10):1223–1235, October 1995.
R. H. Saavedra-Barrera and A. J. Smith. Analysis of benchmark characteristics and benchmark performance prediction. Technical Report CSD 92-715, Computer Science Division, UC Berkeley, 1992.
R. H. Saavedra-Barrera, A. J. Smith, and E. Miya. Machine characterization based on an abstract high-level language machine. IEEE Transactions on Computers, 38(12):1659–1679, December 1989.
V. Sarkar. Determining average program execution times and their variance. In Proceedings of the ACM SIGPLAN Conference on Programming Languages Design and Implementation’ 89, pages 298–312, Portland, Oregon, July 1989.
R. A. Sugumar and S. G. Abraham. Set-associative cache simulation using generalized binomial trees. ACM Trans. Comp. Sys., 13(1), 1995.
J. G. Thompson and A. J. Smith. Efficient (stack) algorithms for analysis of write-back and sector memories. ACM Transactions on Computer Systems, 7(1), 1989.
W.-H. Wang and J.-L. Baer. Efficient trace-driven simulation methods for cache performance analysis. ACM Transactions on Computer Systems, 9(3), 1991.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2000 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Cascaval, C., DeRose, L., Padua, D.A., Reed, D.A. (2000). Compile-Time Based Performance Prediction. In: Carter, L., Ferrante, J. (eds) Languages and Compilers for Parallel Computing. LCPC 1999. Lecture Notes in Computer Science, vol 1863. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44905-1_23
Download citation
DOI: https://doi.org/10.1007/3-540-44905-1_23
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-67858-8
Online ISBN: 978-3-540-44905-8
eBook Packages: Springer Book Archive