Advertisement

Performance Measurement and Analysis of Transactional Memory and Speculative Execution on IBM Blue Gene/Q

  • Jie Jiang
  • Peter Philippen
  • Michael Knobloch
  • Bernd Mohr
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8632)

Abstract

The core count of modern processors is steadily increasing, forcing programmers to use more concurrent threads or tasks to effectively use the available hardware. This in turn makes it increasingly challenging to achieve correct and efficient thread synchronization. To support the programmer in this task, IBM introduced hardware transactional memory (TM) and speculative execution (SE) in their Blue Gene/Q system with its 16-core processor, which permits to run 64 simultaneous hardware threads in SMT mode. TM and SE allow for parallelization when race conditions may happen, however upon their detection the respective parts of the execution are rolled back and re-executed serially. This incurs some overhead and therefore usage must be well justified. In this paper, we describe extensions to the community instrumentation and measurement infrastructure Score-P, allowing developers to instrument, measure, and analyze applications. To our knowledge, this is the first integrated performance tool framework allowing to analyze TM/SE programs. We demonstrate its usefulness and effectiveness by describing experiments with benchmarks and a real-world application.

Keywords

Parallel Programming Performance Analysis Trans- actional Memory Speculative Execution Blue Gene/Q 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Ohmacht, M., Wang, A., Gooding, T., Nathanson, B., Nair, I., Janssen, G., Schaal, M., Steinmacher-Burow, B.: IBM Blue Gene/Q memory subsystem with speculative execution and transactional memory. IBM Journal of Research and Development 57(1/2), 1–7 (2013)CrossRefGoogle Scholar
  2. 2.
    Knüpfer, A., et al.: Score-P – A Joint Performance Measurement Run-Time Infrastructure for Periscope, Scalasca, TAU, and Vampir. In: Proc. of 5th Parallel Tools Workshop, 2011, Dresden, Germany, pp. 79–91. Springer (September 2012)Google Scholar
  3. 3.
    Geimer, M., Kuhlmann, B., Pulatova, F., Wolf, F., Wylie, B.J.N.: Scalable Collation and Presentation of Call-Path Profile Data with CUBE. In: Proc. of the Conference on Parallel Computing (ParCo), Aachen/Jülich, Germany, pp. 645–652 (September 2007), Minisymposium Scalability and Usability of HPC Programming ToolsGoogle Scholar
  4. 4.
    Herlihy, M., Moss, J.E.B.: Transactional Memory: Architectural Support for Lock-free Data Structures. In: Proc. of the 20th Annual Intl. Symposium on Computer Architecture, ISCA 1993, pp. 289–300. ACM, New York (1993)Google Scholar
  5. 5.
    Shavit, N., Touitou, D.: Software transactional memory. Distributed Computing 10(2), 99–116 (1997)CrossRefGoogle Scholar
  6. 6.
    Ansari, M., Jarvis, K., Kotselidis, C., Luján, M., Kirkham, C., Watson, I.: Profiling transactional memory applications. In: 2009 17th Euromicro International Conference on Parallel, Distributed and Network-based Processing, pp. 11–20. IEEE (2009)Google Scholar
  7. 7.
    Zyulkyarov, F., Stipic, S., Harris, T., Unsal, O.S., Cristal, A., Hur, I., Valero, M.: Profiling and Optimizing Transactional Memory Applications. Intl. Journal of Parallel Programming 40(1), 25–56 (2012)CrossRefGoogle Scholar
  8. 8.
    Lourenço, J., Dias, R., Luís, J., Rebelo, M., Pessanha, V.: Understanding the behavior of transactional memory applications. In: Proc. 7th Workshop on Parallel and Distributed Systems: Testing, Analysis, and Debugging, p. 3. ACM (2009)Google Scholar
  9. 9.
    Cascaval, C., Blundell, C., Michael, M., Cain, H.W., Wu, P., Chiras, S., Chatterjee, S.: Software Transactional Memory: Why Is It Only a Research Toy? Queue 6(5), 40:46–40:58 (2008)Google Scholar
  10. 10.
    Wang, A., Gaudet, M., Wu, P., Amaral, J.N., Ohmacht, M., Barton, C., Silvera, R., Michael, M.: Evaluation of Blue Gene/Q hardware support for transactional memories. In: Proc. of the 21st International Conference on Parallel Architectures and Compilation Techniques, pp. 127–136. ACM (2012)Google Scholar
  11. 11.
    Schindewolf, M., Biliari, B., Gyllenhaal, J., Schulz, M., Wang, A., Karl, W.: What scientific applications can benefit from hardware transactional memory? In: 2012 International Conference for High Performance Computing, Networking, Storage and Analysis (SC), pp. 1–11. IEEE (2012)Google Scholar
  12. 12.
    Kunaseth, M., Kalia, R.K., Nakano, A., Vashishta, P., Richards, D.F., Glosli, J.N.: Performance Characteristics of Hardware Transactional Memory for Molecular Dynamics Application on BlueGene/Q: Toward Efficient Multithreading Strategies for Large-Scale Scientific Applications. In: Proc. of Intl. Workshop on Parallel and Distributed Scientific and Engineering Computing (2013)Google Scholar
  13. 13.
    Schindewolf, M., Rocker, B., Karl, W., Heuveline, V.: Evaluation of Two Formulations of the Conjugate Gradients Method with Transactional Memory. In: Wolf, F., Mohr, B., an Mey, D. (eds.) Euro-Par 2013. LNCS, vol. 8097, pp. 508–520. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  14. 14.
    Bihari, B.L., Wong, M., Wang, A., de Supinski, B.R., Chen, W.: A case for including transactions in openmp ii: Hardware transactional memory. In: Chapman, B.M., Massaioli, F., Müller, M.S., Rorro, M. (eds.) IWOMP 2012. LNCS, vol. 7312, pp. 44–58. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  15. 15.
    Yoo, R.M., Hughes, C.J., Lai, K., Rajwar, R.: Performance evaluation of Intel® transactional synchronization extensions for high-performance computing. In: Proc. of SC13: Intl. Conference for High Performance Computing, Networking, Storage and Analysis, p. 19. ACM (2013)Google Scholar
  16. 16.
  17. 17.
    Mohr, B., Malony, A.D., Hoppe, H.C., Schlimbach, F., Haab, G., Hoeflinger, J., Shah, S.: A Performance Monitoring Interface for OpenMP. In: Proc. of Fourth European Workshop on OpenMP (EWOMP), Rome, Italy (September 2002)Google Scholar
  18. 18.
    Maurer, T.: BG/Q Application Tuning – memory hierarchy, transactional memory, speculative execution, http://www.fz-juelich.de/SharedDocs/Downloads/IAS/JSC/EN/slides/juqueenpt13/juqueenpt13-applicationtuning1.pdf
  19. 19.
    Sutmann, G., Westphal, L., Bolten, M.: Particle based simulations of complex systems with mp2c: hydrodynamics and electrostatics. In: ICNAAM 2010: International Conference of Numerical Analysis and Applied Mathematics 2010, vol. 1281, pp. 1768–1772. AIP Publishing (2010)Google Scholar
  20. 20.
    Brunst, H., Mohr, B.: Performance Analysis of Large-Scale OpenMP and Hybrid MPI/OpenMP Applications with Vampir NG. In: Mueller, M.S., Chapman, B.M., de Supinski, B.R., Malony, A.D., Voss, M. (eds.) IWOMP 2005/2006. LNCS, vol. 4315, pp. 5–14. Springer, Heidelberg (2008)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Jie Jiang
    • 1
    • 2
  • Peter Philippen
    • 1
  • Michael Knobloch
    • 1
  • Bernd Mohr
    • 1
  1. 1.Institute for Advanced Simulation, Jülich Supercomputing CentreForschungszentrum Jülich GmbHJülichGermany
  2. 2.School of Computer ScienceNational University of Defense TechnologyChangshaChina

Personalised recommendations