Abstract
Hardware performance counters provide information about events in the hardware platform (e.g., cache misses, pipeline stalls), in contrast to profiles that capture program properties (e.g., execution frequencies for basic blocks, methods, function calls). As platform architectures become more complex and also more diverse, it is important for a compiler to exploit platform-specific information. A dynamic (JIT) compiler is in the unique position to run on the same platform as the target application, but in practice, exploiting the wealth of information available through performance counters is far from easy. If a JIT compiler is to use performance counter information, this information must be fine-grained (e.g., attributing cache misses to a single load instruction) and must be obtainable without undue overhead. We present a runtime+compiler framework to tie hardware performance counter information to a dynamic compiler and argue that the overhead is low and fine-grained. As parallel architectures or multi-core architectures proliferate, performance issues will play a crucial role in all compilation engines, and our paper reports on a modular approach to make such counter information available to the compiler.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
IA-32 Intel Architecture Software Developer;s Manual, Volume 3: System Programming Guide (2005)
Adl-Tabatabai, A.-R., Hudson, R.L., Serrano, M.J., Subramoney, S.: Prefetch injection based on hardware monitoring and object metadata. In: Proc. of the ACM SIGPLAN 2004 Conf. on Programming language design and implementation, pp. 267–276. ACM Press, New York (2004)
Alpern, B., Attanasio, C.R., Barton, J.J., Cocchi, A., Hummel, S.F., ber, D. L., Ngo, T., Mergen, M.F., Shepherd, J.C., Smith, S.: Implementing jalapeno in java. In: Conference on Object-Oriented, pp. 314–324 (1999)
Alpern, B., Attanasio, D., Barton, J., Burke, M., Cheng, P., Choi, J.-D., Cocchi, A., Fink, S., Grove, D., Hind, M., Hummel, S.F., Lieber, D., Litvinov, V., on Ngo, T., Mergen, M., Sarkar, V., Serrano, M., Shepherd, J., Smith, S., Sreedhar, V.C., rini Srinivasan, H., Whaley, J.: The Jalapeno virtual machine. IBM Systems Journal, Java Performance Issue 39(1) (2000)
Ammons, G., Ball, T., Larus, J.R.: Exploiting hardware performance counters with flow and context sensitive profiling. In: Proc. of the ACM SIGPLAN 1997 conference on Programming language design and implementation, pp. 85–96. ACM Press, New York (1997)
Arnold, M., Fink, S., Grove, D., Hind, M., Sweeney, P.F.: Adaptive optimization in the jalapeo JVM. In: Proc. of the 15th ACM SIGPLAN conference on Objectoriented programming, systems, languages, and applications, pp. 47–65. ACM Press, New York (2000)
Brink & Abyss, http://www.eg.bucknell.edu/bsprunt/emon/brinkabyss/brinkabyss.shtm
Chang, P.P., Mahlke, S.A., Hwu, W.W.: Using profile information to assist classic code optimizations. Software Practice and Experience 21(12), 1301–1321 (1991)
Georges, A., Buytaert, D., Eeckhout, L., Bosschere, K.D.: Method-level phase behavior in java workloads. In: Proc. of the 19th annual ACM SIGPLAN Conference on Object-oriented programming, systems, languages, and applications, pp. 270–287. ACM Press, New York (2004)
Goldberg, A.J., Hennessy, J.L.: Performance debugging shared memory multiprocessor programs with mtool. In: Supercomputing 1991: Proc. of the 1991 ACM/IEEE conference on Supercomputing, pp. 481–490. ACM Press, New York (1991)
Goldschmidt, S.R., Hennessy, J.L.: The accuracy of trace-driven simulations of multiprocessors. In: Proc. of the 1993 ACM SIGMETRICS conference on Measurement and modeling of computer systems, pp. 146–157. ACM Press, New York (1993)
Hauswirth, M., Sweeney, P.F., Diwan, A., Hind, M.: Vertical profiling: understanding the behavior of object-priented applications. In: Proc. of the 19th annual ACM SIGPLAN Conference on Object-oriented programming, systems, languages, and applications, pp. 251–269. ACM Press, New York (2004)
Huang, X., Blackburn, S.M., McKinley, K.S., Moss, J.E.B., Wang, Z., Cheng, P.: The garbage collection advantage: improving program locality. In: Proc. of the 19th annual ACM SIGPLAN Conference on Object-oriented programming, systems, languages, and applications, pp. 69–80. ACM Press, New York (2004)
Lam, M.S., Rothberg, E.E., Wolf, M.E.: The cache performance and optimizations of block algorithms. In: 4th International Conference on Architectural Support for Programming Languages and Operating Systems, Santa Clara, CA, April 1991, pp. 63–74 (1991)
Lubeck, O., et al.: WS6: Hardware Performance Monitor Design and Functionality, Los Alamos Computer Science Institute Symposium (2005), Web archive, February 12-16 (2005), San Francisco (2005), http://lacsi.rice.edu/workshops/hpca11
Mowry, T.C., Lam, M.S., Gupta, A.: Design and evaluation of a compiler algorithm for prefetching. In: Proc. of the 5th international conf. on Architectural support for programming languages and operating systems, pp. 62–73. ACM Press, New York (1992)
Pettis, K., Hansen, R.: Profile guided code positioning. In: Proc. ACM SIGPLAN 1990 Conf. on Prog, White Plains, N.Y, pp. 16–27. ACM, New York (1990)
Rivera, G., Tseng, C.-W.: Data transformations for eliminating conflict misses. In: Proc. of the ACM SIGPLAN 1998 Conf. on Programming language design and implementation, pp. 38–49. ACM Press, New York (1998)
Sprunt, B.: Pentium 4 performance monitoring features. IEEE Micro., 72–82 (July-August 2002)
Suganuma, T., Yasue, T., Kawahito, M., Komatsu, H., Nakatani, T.: A dynamic optimization framework for a Java just-in-time compiler. In: Conf. on Object- Oriented Programming, Systems, Languages & Applications (OOPSLA 2001), pp. 180–194 (2001)
The Standard Performance Evaluation Corporation. SPEC JBB2000 Benchmark, http://www.spec.org/jbb2000/
The Standard Performance Evaluation Corporation. SPEC JVM98 Benchmarks (1996), http://www.spec.org/osg/jvm98
Uhlig, R.A., Mudge, T.N.: Trace-driven memory simulation: a survey. ACM Comput. Surv. 29(2), 128–170 (1997)
Vera, X., Bermudo, N., Llosa, J., Gonz´alez, A.: A fast and accurate framework to analyze and optimize cache memory behavior. ACM Trans. Program. Lang. Syst. 26(2), 263–300 (2004)
Wolf, M.E., Lam, M.S.: A data locality optimizing algorithm. In: Proc. of the ACM SIGPLAN 1991 Conference on Programming Language Design and Implementation, Toronto, Ontario, Canada, June 1991, vol. 26, pp. 30–44 (1991)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Schneider, F., Gross, T.R. (2006). Using Platform-Specific Performance Counters for Dynamic Compilation. In: Ayguadé, E., Baumgartner, G., Ramanujam, J., Sadayappan, P. (eds) Languages and Compilers for Parallel Computing. LCPC 2005. Lecture Notes in Computer Science, vol 4339. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-69330-7_23
Download citation
DOI: https://doi.org/10.1007/978-3-540-69330-7_23
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-69329-1
Online ISBN: 978-3-540-69330-7
eBook Packages: Computer ScienceComputer Science (R0)