Trace-Based Detection of Lock Contention in MPI One-Sided Communication

Hermanns, Marc-André; Geimer, Markus; Mohr, Bernd; Wolf, Felix

doi:10.1007/978-3-319-56702-0_6

Marc-André Hermanns⁷,
Markus Geimer⁸,
Bernd Mohr⁷ &
…
Felix Wolf⁹

445 Accesses
2 Citations

Abstract

Performance analysis is an essential part of the development process of HPC applications. Thus, developers need adequate tools to evaluate design and implementation decisions to effectively develop efficient parallel applications. Therefore, it is crucial that tools provide an as complete support as possible for the available language and library features to ensure that design decisions are not negatively influenced by the level of available tool support. The message passing interface (MPI) supports three basic communication paradigms: point-to-point, collective, and one-sided. Each of these targets and excels at a specific application scenario. While current performance tools support the first two quite well, one-sided communication is often neglected. In our earlier work, we were able to reduce this gap by showing how wait states in MPI one-sided communication using active-target synchronization can be detected at large scale using our trace-based message replay technique. Further extending our work on the detection of progress-related wait states in ARMCI, this paper presents an improved infrastructure that is capable of not only detecting progress-related wait states, but also wait states due to lock contention in MPI passive-target synchronization. We present an event-based definition of lock contention, the trace-based algorithm to detect it, as well as initial results with a micro-benchmark and an application kernel scaling up to 65,536 processes.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Adhianto, L., Banerjee, S., Fagan, M.W., Krentel, M., Marin, G., Mellor-Crummey, J.M., Tallent, N.R.: HPCTOOLKIT: tools for performance analysis of optimized parallel programs. Concurr. Comput.: Pract. Exper. 22 (6), 685–701 (2010). doi:10.1002/cpe.1553. http://doi.wiley.com/10.1002/cpe.1553
Google Scholar
Böhme, D., Geimer, M., Wolf, F., Arnold, L.: Identifying the root causes of wait states in large-scale parallel applications. In: Proceedings of the 39th International Conference on Parallel Processing (ICPP), San Diego, CA, pp. 90–100 (2010). doi:10.1109/ICPP.2010.18
Google Scholar
Böhme, D., de Supinski, B.R., Geimer, M., Schulz, M., Wolf, F.: Scalable critical-path based performance analysis. In: Proceedings of the 26th IEEE International Parallel & Distributed Processing Symposium (IPDPS), Shanghai (2012)
Google Scholar
Chapman, B.M., Curtis, A., Pophale, S., Poole, S.W., Kuehn, J.A., Koelbel, C., Smith, L., Curtis, T., Pophale, S., Poole, S.W., Kuehn, J.A., Koelbel, C., Smith, L., Curtis, A., Pophale, S., Poole, S.W., Kuehn, J.A., Koelbel, C., Smith, L.: Introducing OpenSHMEM: SHMEM for the PGAS community. In: Proceedings of the Fourth Conference on Partitioned Global Address Space Programming Model, no. c in PGAS ’10, pp. 2:1–2:3. ACM, New York, NY (2010). doi:10.1145/2020373.2020375. http://doi.acm.org/10.1145/2020373.2020375
Geimer, M., Wolf, F., Wylie, B.J.N., Mohr, B.: A scalable tool architecture for diagnosing wait states in massively parallel applications. Parallel Comput. 35 (7), 375–388 (2009). doi:10.1016/j.parco.2009.02.003
Article Google Scholar
Hermanns, M.A., Geimer, M., Mohr, B., Wolf, F.: Scalable detection of MPI-2 remote memory access inefficiency patterns. Int. J. High Perform. Comput. Appl. 26 (3), 227–236 (2012). doi:10.1177/1094342011406758
Article Google Scholar
Hermanns, M.A., Krishnamoorthy, S., Wolf, F.: A scalable infrastructure for the performance analysis of passive target synchronization. Parallel Comput. 39 (3), 132–145 (2013). doi:10.1016/j.parco.2012.09.002. http://www.sciencedirect.com/science/article/pii/S0167819112000762
Article Google Scholar
Intel Corp.: Intel VTune Amplifier XE (2012). http://software.intel.com/en-us/intel-vtune-amplifier-xe
Google Scholar
Jülich Supercomputing Centre: JUQUEEN: IBM Blue Gene/Q Supercomputer System at the Jülich Supercomputing Centre. J. Large-Scale Res. Facil. 1 (A1) (2015). doi:10.17815/jlsrf-1-18. http://dx.doi.org/10.17815/jlsrf-1-18
Kühnal, A., Hermanns, M.A., Mohr, B., Wolf, F.: Specification of inefficiency patterns for MPI-2 one-sided communication. In: Proceedings of the 12th Euro-Par Conference, Dresden. Lecture Notes in Computer Science, vol. 4128, pp. 47–62. Springer, Berlin (2006)
Google Scholar
MPI Forum (ed.): MPI: A Message-Passing Interface Standard. Version 3.1. MPI Forum (2015). http://www.mpi-forum.org/
Nieplocha, J., Carpenter, B.: ARMCI: a portable remote memory copy library for distributed array libraries and compiler run-time systems. In: Proceedings of the 11 IPPS/SPDP’99 Workshops Held in Conjunction with the 13th International Parallel Processing Symposium and 10th Symposium on Parallel and Distributed Processing, vol. 1586, pp. 533–546. Springer, London (1999). doi:10.1007/BFb0097937. http://dl.acm.org/citation.cfm?id=645611.662053
Tallent, N.R., Mellor-Crummey, J.M., Porterfield, A.: Analyzing lock contention in multithreaded applications. SIGPLAN Not. 45 (5), 269–280 (2010). doi:10.1145/1837853.1693489. http://doi.acm.org/10.1145/1837853.1693489
Article Google Scholar
Tallent, N.R., Vishnu, A., Van Dam, H., Daily, J., Kerbyson, D.J., Hoisie, A.: Diagnosing the causes and severity of one-sided message contention. In: Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP 2015, pp. 130–139. ACM, New York, NY (2015). doi:10.1145/2688500.2688516. http://doi.acm.org/10.1145/2688500.2688516
Zounmevo, J.A., Zhao, X., Balaji, P., Gropp, W., Afsahi, A.: Nonblocking epochs in MPI one-sided communication. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC ’14, pp. 475–486. IEEE Press, Piscataway, NJ (2014). doi:10.1109/SC.2014.44. http://dx.doi.org/10.1109/SC.2014.44

Download references

Acknowledgements

This work has been partly funded by the Excellence Initiative of the German federal and state governments. The authors gratefully acknowledge the computing time granted by the JARA-HPC Vergabegremium and VSR commission provided on the JARA-HPC Partition part of the supercomputer JUQUEEN [9] at Forschungszentrum Jülich.

Author information

Authors and Affiliations

JARA-HPC, Jülich Supercomputing Centre, Forschungszentrum Jülich GmbH, Jülich, Germany
Marc-André Hermanns & Bernd Mohr
Jülich Supercomputing Centre, Forschungszentrum Jülich GmbH, Jülich, Germany
Markus Geimer
Parallel Programming, TU Darmstadt, Darmstadt, Germany
Felix Wolf

Authors

Marc-André Hermanns
View author publications
You can also search for this author in PubMed Google Scholar
Markus Geimer
View author publications
You can also search for this author in PubMed Google Scholar
Bernd Mohr
View author publications
You can also search for this author in PubMed Google Scholar
Felix Wolf
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Marc-André Hermanns .

Editor information

Editors and Affiliations

Höchstleistungszentrum Stuttgart (HLRS), Universität Stuttgart , Stuttgart, Germany
Christoph Niethammer
Höchstleistungszentrum Stuttgart (HLRS), Universität Stuttgart , Stuttgart, Germany
José Gracia
Zentrum für Informationsdienste und Hochleistungsrechnen (ZIH), Technische Universität Dresden , Dresden, Germany
Tobias Hilbrich
Zentrum für Informationsdienste und Hochleistungsrechnen (ZIH), Technische Universität Dresden, Dresden, Germany
Andreas Knüpfer
Höchstleistungszentrum Stuttgart (HLRS), Universität Stuttgart , Stuttgart, Germany
Michael M. Resch
Zentrum für Informationsdienste und Hochleistungsrechnen (ZIH), Technische Universität Dresden , Dresden, Germany
Wolfgang E. Nagel

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hermanns, MA., Geimer, M., Mohr, B., Wolf, F. (2017). Trace-Based Detection of Lock Contention in MPI One-Sided Communication. In: Niethammer, C., Gracia, J., Hilbrich, T., Knüpfer, A., Resch, M., Nagel, W. (eds) Tools for High Performance Computing 2016. Springer, Cham. https://doi.org/10.1007/978-3-319-56702-0_6

Download citation

DOI: https://doi.org/10.1007/978-3-319-56702-0_6
Published: 09 May 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-56701-3
Online ISBN: 978-3-319-56702-0
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics