The Journal of Supercomputing

, Volume 23, Issue 1, pp 81–104 | Cite as

HPCVIEW: A Tool for Top-down Analysis of Node Performance

  • John Mellor-Crummey
  • Robert J. Fowler
  • Gabriel Marin
  • Nathan Tallent
Article

Abstract

It is increasingly difficult for complex scientific programs to attain a significant fraction of peak performance on systems that are based on microprocessors with substantial instruction-level parallelism and deep memory hierarchies. Despite this trend, performance analysis and tuning tools are still not used regularly by algorithm and application designers. To a large extent, existing performance tools fail to meet many user needs and are cumbersome to use. To address these issues, we developed HPCVIEW—a toolkit for combining multiple sets of program profile data, correlating the data with source code, and generating a database that can be analyzed anywhere with a commodity Web browser. We argue that HPCVIEW addresses many of the issues that have limited the usability and the utility of most existing tools. We originally built HPCVIEW to facilitate our own work on data layout and optimizing compilers. Now, in addition to daily use within our group, HPCVIEW is being used by several code development teams in DoD and DoE laboratories as well as at NCSA.

performance evaluation software performance binary analysis software tools 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    V. Adve, J. Wang, J. Mellor-Crummey, D. Reed, M. Anderson, and K. Kennedy. An integrated compilation and performance analysis environment for data parallel programs. In Proceedings of Supercomputing '95, December 1995. http://www.supercomp.org/sc95/proceedings/528_VADV/PAPER.PS.Google Scholar
  2. 2.
    J. Anderson, L. Berc, J. Dean, S. Ghemawat, M. Henzinger, S. Leung, R. Sites, M. Vandevoorde, C. Waldspurger, and W. Weihl. Continuous profiling: Where have all the cycles gone. In Proceedings of the 16th ACM Symposium of Operating Systems Principles.Google Scholar
  3. 3.
    D. Callahan, J. Cocke, and K. Kennedy. Estimating interlock and improving balance for pipelined machines. Journal of Parallel and Distributed Computing, 5(4):334–358, 1988.Google Scholar
  4. 4.
    K. W. Cameron, Y. Luo, and J. Scharmeier. Instruction-level microprocessor modeling of scientific applications. In ISHPC 1999, pp. 29–40, Japan, May 1999.Google Scholar
  5. 5.
    M. Crovella and T. LeBlanc. Parallel performance prediction using lost cycles. In Proceedings Supercomputing '94, pp. 600–610, November 1994.Google Scholar
  6. 6.
    J. Dean, J. E. Hicks, C. A. Waldspurger, W. E. Weihl, and G. Chrysos. ProfileMe: Hardware support for instruction-level profiling on out-of-order processors. In Proceedings of the 30th Annual International Symposium on Microarchitecture (Micro '97), Research Triangle Park, NC, 1997. http://citeseer.nj.nec.com/dean97profileme.html.Google Scholar
  7. 7.
    L. DeRose, Y. Zhang, and D. Reed. SvPablo: A multi-language performance analysis system. In 10th International Conference on Performance Tools, pp. 352–355, September 1998.Google Scholar
  8. 8.
    Luiz DeRose. The hardware performance monitor toolkit. In European Conference on Parallel Computing (Euro-Par), Lecture Notes in Computer Science 2150, pp. 122–131, Manchester, United Kingdom, August 2001. Springer-Verlag.Google Scholar
  9. 9.
    A. J. Goldberg and J. Hennessy. MTOOL: A method for isolating memory bottlenecks in shared memory multiprocessor programs. In Proceedings of the 1991 International Conference on Parallel Processing, volume II, Software, pp. II-251-II-257, Boca Raton, FL, August 1991. CRC Press.Google Scholar
  10. 10.
    W3C Math Working Group. Mathematical markup language (MathML) 1.01 specification, July 1999. http://www.w3.org/TR/REC-MathML.Google Scholar
  11. 11.
    Paul Havlak. Nesting of reducible and irreducible loops. ACM Transactions on Programming Languages and Systems, 19(4):557–567, July 1997.Google Scholar
  12. 12.
    The ASCI 30-TeraOps SMG98 Sample Application. DOE Accelerated Strategic Computing Initiative. http://www.acl.lanl.gov/30TeraOpRFP/SampleApps/smg98/smg98.html.Google Scholar
  13. 13.
    The ASCI sweep3d Benchmark Code. DOE Accelerated Strategic Computing Initiative. http://www.llnl.gov/asci_benchmarks/asci/limited/sweep3d/asci_sweep3d.html.Google Scholar
  14. 14.
    C. Janssen. The Visual Profiler, 1999. http://aros.ca.sandia.gov/~cljanss/perf/vprof/doc/README. html.Google Scholar
  15. 15.
    T. Y. Johnston and R. H. Johnson. Program performance measurement. Technical report SLAC User Note 33, Rev. 1. SLAC, Stanford University, California, 1970.Google Scholar
  16. 16.
    D. E. Knuth and F. R. Stevenson. Optimal measurement points for program frequency counts. BIT, 13(3):313–322, 1973.Google Scholar
  17. 17.
    J. Larus and E. Schnarr. EEL: Machine-independent executable editing. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, pp. 291–300, June 1995.Google Scholar
  18. 18.
    M. Martonosi, D. Ofelt, and M. Heinrich. Integrating performance monitoring and communication in parallel computers. In Proceedings of the ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems, pp. 138–147, May 1996.Google Scholar
  19. 19.
    W. Meira Jr., T. LeBlanc, and A. Poulos. Waiting time analysis and performance visualization in Carnival. In Proceedings of ACM SIGMETRICS Symposium on Parallel and Distributed Tools, pp. 1–10, May 1996.Google Scholar
  20. 20.
    John Mellor-Crummey, Robert Fowler, and David Whalley. On providing useful information for analyzing and tuning applications. In Joint International Conference on Measurement & Modeling of Computer Systems, pp. 332–333, Cambridge, MA, June 2001.Google Scholar
  21. 21.
    John Mellor-Crummey, Robert Fowler, and David Whalley. Tools for application-oriented performance tuning. In Proceedings of the 15th ACM International Conference on Supercomputing, pp. 154–165, Sorrento, Italy, June 2001.Google Scholar
  22. 22.
    Silicon Graphics Incorporated. MIPS R10000 Microprocessor User's Manual Version 2.0, 1996. http://www.sgi.com/processors/r10k/manual.html.Google Scholar
  23. 23.
    Sun Microsystems. Analyzing Program Performance With Sun WorkShop, 2001. http://docs.sun.com/ htmlcoll/coll.36.7/iso-8859–1/SWKSHPPERF/AnalyzingTOC.html.Google Scholar
  24. 24.
    R. E. Tarjan. Testing flow graph reducibility. Journal of Computer and System Sciences, 9:355–365, 1974.Google Scholar
  25. 25.
    M. Zagha, B. Larson, S. Turner, and M. Itzkowitz. Performance analysis using the MIPS R10000 performance counters. In Proceedings Supercomputing '96, November 1996.Google Scholar

Copyright information

© Kluwer Academic Publishers 2002

Authors and Affiliations

  • John Mellor-Crummey
    • 1
  • Robert J. Fowler
    • 1
  • Gabriel Marin
    • 1
  • Nathan Tallent
    • 1
  1. 1.Department of Computer Science, MS 132Rice UniversityHouston

Personalised recommendations