Advertisement

ScalOMP: Analyzing the Scalability of OpenMP Applications

  • Anton DaumenEmail author
  • Patrick Carribault
  • François Trahay
  • Gaël Thomas
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11718)

Abstract

Achieving good scalability from parallel codes is becoming increasingly difficult due to the hardware becoming more and more complex. Performance tools help developers but their use is sometimes complicated and very iterative. In this paper we propose a simple methodology for assessing the scalability and for detecting performance problems in an OpenMP application. This methodology is implemented in a performance analysis tool named ScalOMP that relies on the capabilities of OMPT for analyzing OpenMP applications. ScalOMP reports the code regions with scalability issues and suggests optimization strategies for those issues. The evaluation shows that ScalOMP incurs low overhead and that its suggestions lead to significant performance improvement of several OpenMP applications.

Keywords

Performance tool Scalability OMPT 

References

  1. 1.
    Coral-2 benchmarks. Technical report, Lawrence Livermore National Lab. (LLNL), Livermore, CA, USA. https://asc.llnl.gov/coral-2-benchmarks/index.php
  2. 2.
    Coral benchmarks. Technical report, Lawrence Livermore National Lab. (LLNL), Livermore, CA, USA. https://asc.llnl.gov/CORAL-benchmarks/
  3. 3.
    NAS parallel benchmarks applications (NPB). Technical report, NASA Advanced Supercomputing Division. https://www.nas.nasa.gov/publications/npb.html
  4. 4.
    Barthou, D., Rubial, A.C., Jalby, W., Koliai, S., Valensi, C.: Performance tuning of x86 OpenMP codes with MAQAO. In: Müller, M., Resch, M., Schulz, A., Nagel, W. (eds.) TTools for High Performance Computing 2009, pp. 95–113. Springer, Heidelberg (2010).  https://doi.org/10.1007/978-3-642-11261-4_7CrossRefGoogle Scholar
  5. 5.
    Bohme, D., Geimer, M., Wolf, F., Arnold, L.: Identifying the root causes of wait states in large-scale parallel applications. In: 2010 39th International Conference on Parallel Processing, pp. 90–100 (2010)Google Scholar
  6. 6.
    Calotoiu, A., Hoefler, T., Poke, M., Wolf, F.: Using automated performance modeling to find scalability bugs in complex codes. In: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, p. 45 (2013)Google Scholar
  7. 7.
    Coarfa, C., Mellor-Crummey, J.M., Froyd, N., Dotsenko, Y.: Scalability analysis of SPMD codes using expectations. In: Proceedings of the 21th Annual International Conference on Supercomputing, ICS 2007, Seattle, Washington, USA, 17–21 June 2007, pp. 13–22 (2007)Google Scholar
  8. 8.
    Coulomb, K., Degomme, A., Faverge, M., Trahay, F.: An open-source tool-chain for performance analysis. Tools High Perform. Comput. 2011, 37–48 (2012)Google Scholar
  9. 9.
    Ghane, M., Malik, A.M., Chapman, B., Qawasmeh, A.: False sharing detection in OpenMP applications using OMPT API. In: International Workshop on OpenMP, pp. 102–114 (2015)CrossRefGoogle Scholar
  10. 10.
    Guerraoui, R., Guiroux, H., Lachaize, R., Quéma, V., Trigonakis, V.: Lock-unlock: is that all? A pragmatic analysis of locking in software systems. ACM Trans. Comput. Syst. (TOCS) 36(1), 1 (2019)CrossRefGoogle Scholar
  11. 11.
    Huck, K.A., Malony, A.D., Shende, S., Jacobsen, D.W.: Integrated measurement for cross-platform OpenMP performance analysis. In: DeRose, L., de Supinski, B.R., Olivier, S.L., Chapman, B.M., Müller, M.S. (eds.) IWOMP 2014. LNCS, vol. 8766, pp. 146–160. Springer, Cham (2014).  https://doi.org/10.1007/978-3-319-11454-5_11CrossRefGoogle Scholar
  12. 12.
    Iwainsky, C., et al.: How many threads will be too many? On the scalability of OpenMP implementations. In: Träff, J.L., Hunold, S., Versaci, F. (eds.) Euro-Par 2015. LNCS, vol. 9233, pp. 451–463. Springer, Heidelberg (2015).  https://doi.org/10.1007/978-3-662-48096-0_35CrossRefGoogle Scholar
  13. 13.
    Karlin, I., Keasler, J., Neely, J.: LULESH 2.0 updates and changes. Technical report, Lawrence Livermore National Lab. (LLNL), Livermore, CA, USA (2013)Google Scholar
  14. 14.
    Knüpfer, A., et al.: Score-P: a joint performance measurement run-time infrastructure for periscope, Scalasca, Tau, and Vampir. In: Brunst, H., Müller, M., Nagel, W., Resch, M. (eds.) Tools for High Performance Computing 2011, pp. 79–91. Springer, Heidelberg (2012).  https://doi.org/10.1007/978-3-642-31476-6_7CrossRefGoogle Scholar
  15. 15.
    Müller, M.S., et al.: Developing scalable applications with Vampir, Vampirserver and Vampirtrace. In: Parallel Computing (PARCO), vol. 15, pp. 637–644 (2007)Google Scholar
  16. 16.
    Putigny, B., Goglin, B., Barthou, D.: A benchmark-based performance model for memory-bound HPC applications. In: 2014 International Conference on High Performance Computing & Simulation (HPCS), pp. 943–950 (2014)Google Scholar
  17. 17.
    Reinders, J.: VTune performance analyzer essentials (2005)Google Scholar
  18. 18.
    Woodyard, M.: An experimental model to analyze OpenMP applications for system utilization. In: Chapman, B.M., Gropp, W.D., Kumaran, K., Müller, M.S. (eds.) IWOMP 2011. LNCS, vol. 6665, pp. 22–36. Springer, Heidelberg (2011).  https://doi.org/10.1007/978-3-642-21487-5_3CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Anton Daumen
    • 1
    • 2
    Email author
  • Patrick Carribault
    • 1
  • François Trahay
    • 2
  • Gaël Thomas
    • 2
  1. 1.CEA, DAM, DIFArpajonFrance
  2. 2.Télécom SudParis, Institut Polytechnique de ParisÉvryFrance

Personalised recommendations