Evolution of a Parallel Performance System

  • Allen D. MalonyEmail author
  • Sameer Shende
  • Alan Morris
  • Scott Biersdorff
  • Wyatt Spear
  • Kevin Huck
  • Aroon Nataraj


The TAU Performance System® is an integrated suite of tools for instrumentation, measurement, and analysis of parallel programs targeting large-scale, high-performance computing (HPC) platforms. Representing over fifteen calendar years and fifty person years of research and development effort, TAU’s driving concerns have been portability, flexibility, interoperability, and scalability. The result is a performance system which has evolved into a leading framework for parallel performance evaluation and problem solving. This paper presents the current state of TAU, overviews the design and function of TAU’s main features, discusses best practices of TAU use, and outlines future development.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Ahn, D., Kufrin, R., Raghuraman, A., Seo, J.: Perfsuite.
  2. 2.
    Bell, R., Malony, A., Shende, S.: A portable, extensible, and scalable tool for parallel performance profile analysis. In: Proc. EUROPAR 2003 Conference (EUROPAR03) (2003). URL
  3. 3.
    Bernholdt, D.E., Allan, B.A., Armstrong, R., Bertrand, F., Chiu, K., Dahlgren, T.L., Damevski, K., Elwasif, W.R., Epperly, T.G.W., Govindaraju, M., Katz, D.S., Kohl, J.A., Krishnan, M., Kumfert, G., Larson, J.W., Lefantzi, S., Lewis, M.J., Malony, A.D., McInnes, L., Nieplocha, J., Norris, B., Parker, S.G., Ray, J., Shende, S., Windus, T.L., Zhou, S.: A Component Architecture for High-Performance Scientific Computing. Intl. Journal of High-Performance Computing Applications ACTS Collection Special Issue (2005) Google Scholar
  4. 4.
    Berrendorf, R., Ziegler, H., Mohr, B.: PCL — The Performance Counter Library.
  5. 5.
    Browne, S., Dongarra, J., Garner, N., Ho, G., Mucci, P.: A Portable Programming Interface for Performance Evaluation on Modern Processors. International Journal of High Performance Computing Applications 14(3), 189–204 (2000) CrossRefGoogle Scholar
  6. 6.
    Brunst, H., Malony, A.D., Shende, S., Bell, R.: Online Remote Trace Analysis of Parallel Applications on High-Performance Clusters. In: Proceedings of the ISHPC Conference (LNCS 2858), pp. 440–449. Springer (2003) Google Scholar
  7. 7.
    Brunst, H., Nagel, W.E., Malony, A.D.: A Distributed Performance Analysis Architecture for Clusters. In: Proceedings of the IEEE International Conference on Cluster Computing (Cluster 2003), pp. 73–83. IEEE Computer Society (2003) Google Scholar
  8. 8.
    Buck, B., Hollingsworth, J.: An API for Runtime Code Patching. Journal of High Performance Computing Applications 14(4), 317–329 (2000) CrossRefGoogle Scholar
  9. 9.
    CCA Forum: The Common Component Architecture Forum.
  10. 10.
    DeRose, L.: The Hardware Performance Monitor Toolkit. In: Proceedings of the European Conference on Parallel Computing (EuroPar 2001, LNCS 2150), pp. 122–131. Springer (2001) Google Scholar
  11. 11.
    Dongarra, J., Malony, A.D., Moore, S., Mucci, P., Shende, S.: Performance Instrumentation and Measurement for Terascale Systems. In: Proceedings of the ICCS 2003 Conference (LNCS 2660), pp. 53–62 (2003) Google Scholar
  12. 12.
    Eaton, J.W.: Octave home page. Http://
  13. 13.
    Forum, M.P.I.: MPI: A Message Passing Interface Standard. International Journal of Supercomputer Applications (Special Issue on MPI) 8(3/4) (1994) Google Scholar
  14. 14.
    Foundation, T.A.S.: Apache derby. URL Http://
  15. 15.
    Graham, S., Kessler, P., McKusick, M.: gprof: A Call Graph Execution Profiler. SIGPLAN ’82 Symposium on Compiler Construction pp. 120–126 (1982) Google Scholar
  16. 16.
    Huck, K., Malony, A.: PerfExplorer: A performance data mining framework for large-scale parallel computing. In: Conference on High Performance Networking and Computing (SC’05) (2005) Google Scholar
  17. 17.
    Huck, K., Malony, A., Bell, R., Morris, A.: Design and Implementation of a Parallel Performance Data Management Framework. In: Proc. International Conference on Parallel Processing, ICPP-05 (2005) Google Scholar
  18. 18.
    IBM: IBM DB2 Information Management Software.
  19. 19.
    Knüpfer, A., Brendel, R., Brunst, H., Mix, H., Nagel, W.E.: Introducing the Open Trace Format (OTF). In: Proceedings of the 6th International Conference on Computational Science, Springer Lecture Notes in Computer Science, vol. 3992, pp. 526–533. Reading, UK (2006) Google Scholar
  20. 20.
    Kohn, S., Kumfert, G., Painter, J., Ribbens, C.: Divorcing Language Dependencies from a Scientific Software Library. In: Proceedings of the 10th SIAM Conference on Parallel Processing (2001) Google Scholar
  21. 21.
    Lindlan, K.A., Cuny, J., Malony, A.D., Shende, S., Mohr, B., Rivenburgh, R., Rasmussen., C.: A Tool Framework for Static and Dynamic Analysis of Object-Oriented Software with Templates. In: Proceedings of SC2000: High Performance Networking and Computing Conference (2000) Google Scholar
  22. 22.
    Malony, A., Shende, S.: Distributed and Parallel Systems: From Concepts to Applications, chap. Performance Technology for Complex Parallel and Distributed Systems, pp. 37–46. Kluwer, Norwell, MA (2000) Google Scholar
  23. 23.
    Malony, A.D.: Performance Observability. Ph.D. thesis, University of Illinois at Urbana-Champaign (1990) Google Scholar
  24. 24.
    Mohr, B., Malony, A.D., Shende, S., Wolf, F.: Towards a Performance Tool Interface for OpenMP: An Approach Based on Directive Rewriting. In: Proceedings of Third European Workshop on OpenMP (2001) Google Scholar
  25. 25.
    Mohr, B., Wolf, F.: KOJAK - A Tool Set for Automatic Performance Analysis of Parallel Applications. In: Proceedings of the European Conference on Parallel Computing (EuroPar 2003, LNCS 2790), pp. 1301–1304. Springer (2003) Google Scholar
  26. 26.
  27. 27.
    MySQL: MySQL: The World’s Most Popular Open Source Database Google Scholar
  28. 28.
    Nagel, W., Arnold, A., Weber, M., Hoppe, H.C., Solchenbach, K.: VAMPIR: Visualization and Analysis of MPI Resources. Supercomputer 12(1), 69–80 (1996) Google Scholar
  29. 29.
    Nataraj, A., Malony, A.D., Shende, S., Morris, A.: Integrated parallel performance views. Cluster Computing 11(1), 57–73 (2008). CrossRefGoogle Scholar
  30. 30.
    Nataraj, A., Morris, A., Malony, A.D., Arnold, D., Miller, B.: A Framework for Scalable, Parallel Performance Monitoring using TAU and MRNet. Under submission Google Scholar
  31. 31.
    Nataraj, A., Morris, A., Malony, A.D., Sottile, M., Beckman, P.: The Ghost in the Machine: Observing the Effects of Kernel Operation on Parallel Application Performance. In: ACM/IEEE SC2007. Reno, Nevada (2007) Google Scholar
  32. 32.
    Nataraj, A., Sottile, M., Morris, A., Malony, A.D., Shende, S.: TAUoverSupermon : Low-Overhead Online Parallel Performance Monitoring. In: Europar’07: European Conference on Parallel Processing (2007) Google Scholar
  33. 33.
    Norris, B., Ray, J., McInnes, L., Bernholdt, D., Elwasif, W., Malony, A., Shende, S.: Computational quality of service for scientific components. In: Proceedings of the International Symposium on Component-based Software Engineering (CBSE7). Springer (2004) Google Scholar
  34. 34.
    Oracle Corporation: Oracle.
  35. 35.
    PostgreSQL: PostgreSQL: The World’s Most Advanced Open Source Database.
  36. 36.
    Seidl, S.: VTF3 - A Fast Vampir Trace File Low-Level Management Library. Tech. Rep. ZHR-R-0304, Dresden University of Technology, Center for High-Performance Computing (2003) Google Scholar
  37. 37.
    Shende, S.: The Role of Instrumentation and Mapping in Performance Measurement. Ph.D. thesis, University of Oregon (2001) Google Scholar
  38. 38.
    Shende, S., Malony, A.D.: The TAU parallel performance system. The International Journal of High Performance Computing Applications 20(2), 287–331 (2006). URL CrossRefGoogle Scholar
  39. 39.
    Shende, S., Malony, A.D., Cuny, J., Lindlan, K., Beckman, P., Karmesin, S.: Portable Profiling and Tracing for Parallel Scientific Applications using C++. In: Proceedings of the SIGMETRICS Symposium on Parallel and Distributed Tools, SPDT’98, pp. 134–145 (1998) Google Scholar
  40. 40.
    Shende, S., Malony, A.D., Rasmussen, C., Sottile, M.: A Performance Interface for Component-Based Applications. In: Proceedings of International Workshop on Performance Modeling, Evaluation and Optimization, International Parallel and Distributed Processing Symposium (2003) Google Scholar
  41. 41.
    Subramanya, R., Reddy, R.: Sandia DNS code for 3D compressible flows - Final Report. Tech. Rep. PSC-Sandia-FR-3.0, Pittsburgh Supercomputing Center, PA (2000) Google Scholar
  42. 42.
    Szyperski, C.: Component Software: Beyond Object-Oriented Programming. Addison-Wesley (1997) Google Scholar
  43. 43.
    The R Foundation for Statistical Computing: R project for statistical computing (2007). URL Http://
  44. 44.
    University of Oregon: TAU Portable Profiling.
  45. 45.
    University of Oregon: TAU Portal.
  46. 46.
    University of Oregon: Tuning and Analysis Utilities User’s Guide.
  47. 47.
    Vetter, J., Chambreau, C.: mpiP: Lightweight, Scalable MPI Profiling.
  48. 48.
    Witten, Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann (2005). URL
  49. 49.
    Wolf, F., Mohr, B., Dongarra, J., Moore, S.: Efficient Pattern Search in Large Traces through Successive Refinement. In: Proceedings of the European Conference on Parallel Computing (EuroPar 2004, LNCS 3149), pp. 47–54. Springer (2004) Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Allen D. Malony
    • 1
    Email author
  • Sameer Shende
    • 1
  • Alan Morris
    • 1
  • Scott Biersdorff
    • 1
  • Wyatt Spear
    • 1
  • Kevin Huck
    • 1
  • Aroon Nataraj
    • 1
  1. 1.Performance Research LabUniversity of OregonEugene

Personalised recommendations