Workload analysis of computation intensive tasks: Case study on SPEC CPU95 benchmarks
Several performance analysis tools have been developed with the drawback of dedicated hardware solutions or the compute intenseness of simulations. The modern microprocessors, with hardware support for counting of system hardware events, now make possible universal software tools for the performance analysis of complex application programs such as the SPEC benchmarks.
In this paper, we present a new method to determine system resource utilization (cache miss ratios, CPI values, branch miss predictions) of arbitrary programs, based on a sampling technique, combined with access to processor-internal event counter registers. We present the sprof tool set that is based on this method and enables also the detailed analysis of individual subroutines of a program, as they are executed over time. The high accuracy and the negligible overhead of the tool set is demonstrated. We used the SPEC95 benchmark suite, consisting of 8 integer and 10 floating-point intensive non-trivial programs that are commonly used to define the performance of workstations and servers. As an example, we present the analysis of a SPEC CPU95 benchmark program on different processor architectures.
KeywordsExecution Time Event Counter Compress Function Device Driver UNIX System
Unable to display preview. Download preview PDF.
- [DF96]Joseph D. Darcy and Manuel Fähndrich. Finding cache hotspots in SPEC95. University of California at Berkeley, http://www.cs.berkeley.edu/∼darcy/IRAM/8pec95.hot/ 1996.Google Scholar
- [DR95]Kaivalya Dixit and Jeff Reilly. SPEC95 questions and answers. SPEC Newsletter, 7:7–10, September 1995.Google Scholar
- [GHPS93]Jeffrey D. Gee, Mark D. Hill, Dinosios N. Pnevmatikatos, and Alan J. Smith. Cache performance of the SPEC92 benchmark suite. IEEE Micro, 13(4):17–27, August 1993.Google Scholar
- [GKM83]Susan L. Graham, Peter B. Kessler, and Marshall K. McKusick. An execution profiler for modular programs. Software Practice and Experience, 13:671–685, 1983.Google Scholar
- [GL95]Hui Gao and John L. Larson. Workload characterization using the cray hardware performance monitor. Journal of Supercomputing, 9:391–412, 1995.Google Scholar
- [GT95]Aaron Goldberg and John Trotter. Interrupt-based hardware support for profiling memory system performance. IEEE, pages 518 — 523, August 1995.Google Scholar
- [HMMS96]M. Horowitz, M. Martonosi, T.C. Mowry, and M.D. Smith. Informing memory operations: Providing memory performance feedback in modern processors. In Proceedings of the 23rd International Symposium on Computer Architecture, pages 260 — 270, 1996.Google Scholar
- [HP96]John L. Hennessy and David A. Patterson.Computer Architecture. A Quantitative Approach. Morgan Kaufmann, 1996.Google Scholar
- [Hun95]Doug Hunt. Advanced performance features of the 64-bit PA8000. In COMPCON'95, 1995.Google Scholar
- [INTEL]Intel Corporation. PentiumPro Processor User's Manual, volume 1–3. 1996.Google Scholar
- [LW94]Alvin R. Lebeck and David A. Wood. Cache profiling and the SPEC benchmarks: A case study. IEEE Computer, pages 15–26, October 1994.Google Scholar
- [MIPS]MIPS Corporation. MIPS R10000 Microprocessor User's Manual. 1995.Google Scholar
- [Pro]PROF User's Manual. UNIX Reference Manuals.Google Scholar
- [RBD+97]A. Reinefeld, R. Baraglia, T. Decker, J. Gehring, D. Laforenza, F. Ramme, T. Römke, and J. Simon. The MOL project: An Open, Extensible Metacomputer. In Proc. of Heterogeneous Computing Workshop HCW'97, IEEE Computer Science Press, pages 17–31, 1997.Google Scholar
- [SG94]Ashok Singhal and Aaron J. Goldberg. Architectural support for performance tuning: A case study on the SPARCcenter 2000. Proc. of 21th International Symposium on Computer Architecture, pages 48–59, 1994.Google Scholar
- [WCNSH]E.H. Welbon, C.C. Chan-Nui, D.J. Shippy, and D.A. Hicks. Power2 performance monitor. PowerPC and POWER2. Technical Aspects of the New IBM RISC System/6000.Google Scholar
- [Wei96]Reinhold P. Weicker. A SPEC primer. SPEC World Wide Web Site, http://www.specbench.org, 1996.Google Scholar
- [ZLTI96]Marco Zagha, Brond Larson, Steve Turner, and Marty Itzkowitz. Performance analysis using the MIPS R10000 performance counters. In Preceedings Supercomputing'96, 1996.Google Scholar