Skip to main content

Compile-Time Based Performance Prediction

  • Conference paper
  • First Online:
Languages and Compilers for Parallel Computing (LCPC 1999)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1863))

Abstract

In this paper we present results we obtained using a compiler to predict performance of scientific codes. The compiler, Polaris [3], is both the primary tool for estimating the performance of a range of codes, and the beneficiary of the results obtained from predicting the program behavior at compile time. We show that a simple compile-time model, augmented with profiling data obtained using very light instrumentation, can be accurate within 20% (on average) of the measured performance for codes using both dense and sparse computational methods.

This work is supported in part by Army contract DABT63-95-C-0097; Army contract N66001-97-C-8532; NSF contract MIP-9619351; and a Partnership Award from IBM. This work is not necessarily representative of the positions or policies of the Army or Government.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. T. Ball and J. R. Larus. Branch prediction for free. In Proceedings of the ACM SIGPLAN Conference on Programming Languages Design and Implementation’ 93, pages 300–313, 1993.

    Google Scholar 

  2. U. Banerjee. Dependence analysis. Kluwer Academic Publishers, 1997.

    Google Scholar 

  3. W. Blume, R. Doallo, R. Eigenmann, J. Grout, J. Hoeflinger, T. Lawrence, J. Lee, D. Padua, Y. Paek, W. Pottenger, L. Rauchwerger, and P. Tu. Parallel Programming with Polaris. IEEE Computer, December 1996.

    Google Scholar 

  4. R. Bramley, D. Gannon, T. Stuckey, J. Villacis, J. Balasubramanian, E. Akman, F. Breg, S. Diwan, and M. Govindaraju. The Linear System Analyzer, chapter PSEs. IEEE, 1998.

    Google Scholar 

  5. C. Cascaval and D. A. Padua. Compile-time cache misses estimation using stack distances. In preparation.

    Google Scholar 

  6. P. P. Chang, S. A. Mahlke, and W.-M. W. Hwu. Using profile information to assist classic compiler code optimizations. Software Practice and Experience, 21(12):1301–1321, December 1991.

    Google Scholar 

  7. R. P. Colwell, R. P. Nix, J. J. O’Donnell, D. B. Papworth, and P. K. Rodman. A VLIW architecture for a trace scheduling compiler. In Proceedings of ASPLOS II, pages 180–192, Palo Alto, CA, October 1987.

    Google Scholar 

  8. L. DeRose, Y. Zhang, and D. A. Reed. SvPablo: A multi-language performance analysis system. In 10th International Conference on Computer Performance Evaluation-Modelling Techniques and Tools-Performance Tools’98, pages 352–355, Palma de Mallorca, Spain, September 1998.

    Google Scholar 

  9. T. Fahringer. Evaluation of benchmark performance estimation for parallel Fortran programs on massively parallel SIMD and MIMD computers. In IEEE Proceedings of the 2nd Euromicro Workshop on Parallel and Distributed Processing, Malaga, Spain, January 1994.

    Google Scholar 

  10. T. Fahringer. Automatic Performance Prediction of Parallel Programs. Kluwer Academic Press, 1996.

    Google Scholar 

  11. T. Fahringer. Estimating cache performance for sequential and data parallel programs. Technical Report TR 97-9, Institute for Software Technology and Parallel Systems, Univ. of Vienna, Vienna, Austria, October 1997.

    Google Scholar 

  12. J. A. Fisher. Trace scheduling: A technique for global microcode compaction. IEEE Transactions on Computers, C(30):478–490, July 1981.

    Google Scholar 

  13. J. D. Gee, M. D. Hill, and A. J. Smith. Cache performance of the SPEC92 benchmark suite. In Proceedings of the IEEE Micro, pages 17–27, August 1993.

    Google Scholar 

  14. S. Ghosh, M. Martonosi, and S. Malik. Precise Miss Analysis for Program Transformations with Caches of Arbitrary Associativity. In Proceedings of ASPLOS VIII, San Jose, CA, October 1998.

    Google Scholar 

  15. M. D. Hill and A. J. Smith. Evaluating associativity in cpu caches. IEEE Transactions on Computers, 38(12):1612–1630, December 1989.

    Google Scholar 

  16. Y. Kang, M. Huang, S.-M. Yoo, Z. Ge, D. Keen, V. Lam, P. Pattnaik, and J. Torrellas. FlexRAM: Toward an advanced intelligent memory system. In International Conference on Computer Design (ICCD), October 1999.

    Google Scholar 

  17. R. L. Mattson, J. Gecsei, D. Slutz, and I. Traiger. Evaluation techniques for storage hierarchies. IBM Systems Journal, 9(2), 1970.

    Google Scholar 

  18. J. Reilly. SPEC95 Products and Benchmarks. SPEC Newsletter, September 1995.

    Google Scholar 

  19. R. Saavedra and A. Smith. Measuring cache and tlb performance and their effect on benchmark run times. IEEE Transactions on Computers, 44(10):1223–1235, October 1995.

    Google Scholar 

  20. R. H. Saavedra-Barrera and A. J. Smith. Analysis of benchmark characteristics and benchmark performance prediction. Technical Report CSD 92-715, Computer Science Division, UC Berkeley, 1992.

    Google Scholar 

  21. R. H. Saavedra-Barrera, A. J. Smith, and E. Miya. Machine characterization based on an abstract high-level language machine. IEEE Transactions on Computers, 38(12):1659–1679, December 1989.

    Google Scholar 

  22. V. Sarkar. Determining average program execution times and their variance. In Proceedings of the ACM SIGPLAN Conference on Programming Languages Design and Implementation’ 89, pages 298–312, Portland, Oregon, July 1989.

    Google Scholar 

  23. R. A. Sugumar and S. G. Abraham. Set-associative cache simulation using generalized binomial trees. ACM Trans. Comp. Sys., 13(1), 1995.

    Google Scholar 

  24. J. G. Thompson and A. J. Smith. Efficient (stack) algorithms for analysis of write-back and sector memories. ACM Transactions on Computer Systems, 7(1), 1989.

    Google Scholar 

  25. W.-H. Wang and J.-L. Baer. Efficient trace-driven simulation methods for cache performance analysis. ACM Transactions on Computer Systems, 9(3), 1991.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2000 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Cascaval, C., DeRose, L., Padua, D.A., Reed, D.A. (2000). Compile-Time Based Performance Prediction. In: Carter, L., Ferrante, J. (eds) Languages and Compilers for Parallel Computing. LCPC 1999. Lecture Notes in Computer Science, vol 1863. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44905-1_23

Download citation

  • DOI: https://doi.org/10.1007/3-540-44905-1_23

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-67858-8

  • Online ISBN: 978-3-540-44905-8

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics