Skip to main content

Performance Analysis of OpenSHMEM Applications with TAU Commander

  • Conference paper
  • First Online:
OpenSHMEM and Related Technologies. Big Compute and Big Data Convergence (OpenSHMEM 2017)

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 10679))

Included in the following conference series:


The TAU Performance System® (TAU) is a powerful and highly versatile profiling and tracing tool ecosystem for performance engineering of parallel programs. Developed over the last twenty years, TAU has evolved with each new generation of HPC systems and scales efficiently to hundreds of thousands of cores. TAU’s organic growth has resulted in a loosely coupled software toolbox such that novice users first encountering TAU’s complexity and vast array of features are often intimidated and easily frustrated. To lower the barrier to entry for novice TAU users, ParaTools and the US Department of Energy have developed “TAU Commander,” a performance engineering workflow manager that facilitates a systematic approach to performance engineering, guides users through common profiling and tracing workflows, and offers constructive feedback in case of error. This work compares TAU and TAU Commander workflows for common performance engineering tasks in OpenSHMEM applications and demonstrates workflows targeting two different SHMEM implementations, Intel Xeon “Haswell” and “Knights Landing” processors, direct and indirect measurement methods, callsite, profiles, and traces.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
USD 44.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 60.00
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions


  1. 1.

    Note that not producing any data is a valid experimental result, i.e. this particular experiment raises a fault in the application and the end goal is to use post-mortem debugging to determine the cause of the fault [19].


  1. U.S. Department of Energy INCITE leadership computing, December 2015.

  2. Bader, D.A., Cong, G.: Fast shared-memory algorithms for computing the minimum spanning forest of sparse graphs. J. Par. Distrib. Comp. 66(11), 1366–1378 (2006).

    Article  MATH  Google Scholar 

  3. Browne, S., Dongarra, J., Garner, N., Ho, G., Mucci, P.: A portable programming interface for performance evaluation on modern processors. Int. J. High Perform. Comput. Appl. 3(14), 189–204 (2000)

    Article  Google Scholar 

  4. Chapman, B., Curtis, T., Pophale, S., Poole, S., Kuehn, J., Koelbel, C., Smith, L.: Introducing OpenSHMEM: SHMEM for the PGAS community. In: Proceedings of the Fourth Conference on Partitioned Global Address Space Programming Model, PGAS 2010, pp. 2:1–2:3. ACM, New York (2010).

  5. Francis, I., Drugan, C.: Groundbreaking astrophysics accelerated. HPC Source, February 2013

    Google Scholar 

  6. Geimer, M., Wolf, F., Wylie, B.J.N., Mohr, B.: Scalable parallel trace-based performance analysis. In: Mohr, B., Träff, J.L., Worringen, J., Dongarra, J. (eds.) EuroPVM/MPI 2006. LNCS, vol. 4192, pp. 303–312. Springer, Heidelberg (2006).

    Chapter  Google Scholar 

  7. Hemstad, J., Hanebutte, U.R.: ISx: An integer sort mini-application for the exascale era (2015). Partitioned Global Address Space SC’15 Booth

    Google Scholar 

  8. Jose, J., Kandalla, K., Luo, M., Panda, D.: Supporting hybrid MPI and OpenSHMEM over infiniband: design and performance evaluation. In: The 41st International Conference on Parallel Processing (ICPP), pp. 219–228 (2012)

    Google Scholar 

  9. Knupfer, A., Brunst, H., Nagel, W.: High performance event trace visualization. In: Proceedings of Parallel and Distributed Processing (PDP). IEEE (2005)

    Google Scholar 

  10. Linford, J.C.: TAU commander developer documentation, June 2017.

  11. Linford, J.C., Vadlamani, S., Shende, S., Malony, A.D., Jones, W., Anderson, W.K., Nielsen, E.: Performance engineering FUN3D at scale with TAU Commander. In: Proceedings of the ACM/IEEE The International Conference for High Performance Computing, Networking, Storage and Analysis (SC 2016), November 2016. To Appear

    Google Scholar 

  12. Malony, A., Biersdorff, S., Shende, S., Jagode, H., Tomov, S., Juckeland, G., Dietrich, R., Poole, D., Lamb, C.: Parallel performance measurement of heterogeneous parallel systems with GPUs. In: 2011 International Conference on Parallel Processing (ICPP), pp. 176–185, September 2011

    Google Scholar 

  13. Malony, A.D., Mellor-Crummey, J., Shende, S.S.: Measurement and analysis of parallel program performance using TAU and HPCToolkit. In: Performance Tuning of Scientific Applications. CRC Press, New York, November 2010

    Google Scholar 

  14. ParaTools, Inc.: TAU Commander: An intuitive interface for the TAU Performance Analysis System (2014).

  15. Perez, J., Shende, S.: Furthering the understanding of coronal heating and solar wind origin. Technical report, Argonne National Labs, January 2013

    Google Scholar 

  16. Pophale, S., Nanjegowda, R., Curtis, T., Chapman, B., Jin, H., Poole, S., Kuehn, J.: OpenSHMEM performance and potential: a NPB experimental study. In: The 6th Conference on Partitioned Global Address Space Programming Models (PGAS 2012) (2012)

    Google Scholar 

  17. Seager, K., Choi, S.-E., Dinan, J., Pritchard, H., Sur, S.: Design and implementation of OpenSHMEM using OFI on the aries interconnect. In: Venkata, M.G., Imam, N., Pophale, S., Mintz, T.M. (eds.) OpenSHMEM 2016. LNCS, vol. 10007, pp. 97–113. Springer, Cham (2016).

    Chapter  Google Scholar 

  18. Shende, S., Malony, A.: The TAU Parallel Performance System. Int. J. High Perform. Comput. Appl. 20(2), 287–311 (2006)

    Article  Google Scholar 

  19. Shende, S., Malony, A., Linford, J., Wissink, A., Adamec, S.: Isolating runtime faults with callstack debugging using TAU. In: Proceedings of the HPEC 2012 Conference (2012)

    Google Scholar 

Download references


This work is supported by the United States Department of Energy under DOE SBIR grant DE-SC0009593. This research used resources of the Oak Ridge Leadership Computing Facility at the Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05-00OR22725.

Author information

Authors and Affiliations


Corresponding author

Correspondence to John C. Linford .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Linford, J.C., Khuvis, S., Shende, S., Malony, A., Imam, N., Venkata, M.G. (2018). Performance Analysis of OpenSHMEM Applications with TAU Commander. In: Gorentla Venkata, M., Imam, N., Pophale, S. (eds) OpenSHMEM and Related Technologies. Big Compute and Big Data Convergence. OpenSHMEM 2017. Lecture Notes in Computer Science(), vol 10679. Springer, Cham.

Download citation

  • DOI:

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-73813-0

  • Online ISBN: 978-3-319-73814-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics