Skip to main content

Exploiting Free Execution Slots on EPIC Processors for Efficient and Accurate Runtime Profiling

  • Conference paper
Advances in Computer Systems Architecture (ACSAC 2004)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 3189))

Included in the following conference series:

  • 360 Accesses

Abstract

Dynamic optimization relies on runtime profile information to improve the performance of program execution. Traditional profiling techniques incur significant overhead and are not suitable for dynamic optimization. In this paper, we propose a new profiling technique that incorporates the strength of both software and hardware to achieve near-zero overhead profiling. The compiler passes profiling requests as a few bits of information in branch instructions to the hardware, and the hardware uses the free execution slots available in a user program to execute profiling operations. We have implemented the compiler instrumentation of this technique using an Itanium research compiler. Our result shows that the accurate block profiling incurs very little overhead to the user program in terms of the program scheduling cycles. For example, the average overhead is 0.6% for the SPECint95 benchmarks. The hardware support required for the new profiling is practical. We believe this will enable many profile-driven dynamic optimizations for EPIC processors such as the Itanium processors.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Anderson, J., Berc, L.M., Dean, J., Ghemawat, S., Henzinger, M.R., Leung, S.T., Sites, R.L., Vandevoorde, M.T., Waldspurger, C.A., Weihl, W.E.: Continuous profiling: where have all the cycles gone? In: Proc. 16th Symposium on Operating System Principles (October 1997)

    Google Scholar 

  2. Arnold, Matthew, Ryder, B.G.: A framework for reducing the cost of instrumented code. In: Proceedings of the ACM SIGPLAN 2001 conference on Programming language design and implementation, Snowbird, Utah, United States, June 2001, pp. 168–179 (2001)

    Google Scholar 

  3. August, D.I., Connors, D.A., Mahlke, S.A., Sias, J.W., Crozier, K.M., Cheng, B.-C., Eaton, P.R., Olaniran, Q.B., Hwu, W.-M.W.: Integrated predicated and speculative execution in the IMPACT EPIC architecture. In: Proceedings of 25th Annual International Symposium on Computer Architecture, pp. 227–237 (1998)

    Google Scholar 

  4. Ball, Thomas, Larus, J.: Optimally profiling and tracing programs. ACM Transactions on Programming Languages and Systems 16(3), 1319–1360 (1994)

    Article  Google Scholar 

  5. Ball, Thomas, Larus, J.: Efficient Path Profiling. MICRO-29 (December 1996)

    Google Scholar 

  6. Conte, T.M., Petal, B.A., Cox, J.S.: Using branch handling hardware to support profile-driven optimization. In: Proc. 27th Annual Intl. Symposium on Microarchitecture, pp. 36–45 (December 1996)

    Google Scholar 

  7. Conte, T.M., Menezes, K.N., Hirsh, M.A.: Accurate and practical profile-driven compilation using the profile buffer. In: Proc. 29th Annual Intl. Symposium on Microarchitecture, November 1994, pp. 12–21 (1994)

    Google Scholar 

  8. Dean, J., Hicks, J.E., Waldspurger, C.A., Weihl, W.E., Chrysos, G.: ProfileMe: Hardware Support for Instruction-level Profiling on Out-of-Order Processors. Micro-30 (December 1997)

    Google Scholar 

  9. Diep, Trung, A., Neslson, C., Shen, J.P.: Performance Evaluation of the PowerPC 620 Microarchitecture. In: Proceeding of the 22nd Annual International Symposium on Computer Architecture, June 1995, pp. 163–174 (1995)

    Google Scholar 

  10. Ebcioglu, K., Altman, E., Gschwind, M., Sathaye, S.: Dynamic binary translation and optimization. IEEE Transactions on Computers 50(6), 529–548 (2001)

    Article  Google Scholar 

  11. Eichenberger, A., Lobo, S.M.: Efficient Edge Profiling for ILP-Processor. PACT 1998 (1998)

    Google Scholar 

  12. Knuth, D.E., Stevenson, F.R.: Optimal measurement of points for program frequency counts. BIT 13, pp. 313–322 (1973)

    Google Scholar 

  13. Intel Corp, Itanium Application Developers Architecture Guide (May 1999)

    Google Scholar 

  14. Lee, Yong-fong, Ryder, B.G.: A Comprehensive Approach to Parallel Data Flow Analysis. In: Proceedings of the ACM International Conference on Supercomputing, July 1992, pp. 236–247 (1992)

    Google Scholar 

  15. Merten, Matthew, C., Trick, A.R., George, C.N., Gyllenhaal, J.C., Hwu, W.-m.W.: A Hardware-Driven Profiling Scheme for Identifying Program Hot Spots to Support Runtime Optimization. In: Proceedings of the 26th International Symposium on Computer Architecture (May 1999)

    Google Scholar 

  16. Merten, M.C., Trick, A.R., Nystrom, E.M., Barnes, R.D., Hwu, W.-M.W.: A hardware mechanism for dynamic extraction and relayout of program hot spots. In: Proceedings of the 27th International Symposium on Computer Architecture, pp. 59–70 (2000)

    Google Scholar 

  17. Schnarr, Eric, Larus, J.: Instruction Scheduling and Executable Editing. Micro 29 (December 1996)

    Google Scholar 

  18. Schlansker, M.S., Rau, B.R.: EPIC: Explicitly Parallel Instruction Computing. Computer 33(2), 37–45 (2000)

    Article  Google Scholar 

  19. Zhang, Xiaolan, Wang, Z., Gloy, N., Bradley Chen, J., Smith, M.D.: System Support for Automated Profiling and Optimization. In: 16th ACM Symposium on Operating System Principles, October 5-8 (1997)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Wu, Y., Lee, YF. (2004). Exploiting Free Execution Slots on EPIC Processors for Efficient and Accurate Runtime Profiling. In: Yew, PC., Xue, J. (eds) Advances in Computer Systems Architecture. ACSAC 2004. Lecture Notes in Computer Science, vol 3189. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30102-8_19

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-30102-8_19

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-23003-8

  • Online ISBN: 978-3-540-30102-8

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics