Skip to main content

Debugging Latent Synchronization Errors in MPI-3 One-Sided Communication

  • Conference paper
  • First Online:
Tools for High Performance Computing 2016

Abstract

The Message Passing Interface (MPI-3) provides a one-sided communication interface, also known as MPI Remote Memory Access (RMA), which enables one process to specify all required communication parameters for both the sending and receiving side. While this communication interface enables superior performance potential developers have to deal with a complex memory consistency model. Proper synchronization of asynchronous remote memory accesses to shared data structures is a challenging task. More importantly, it is difficult to pinpoint such synchronization bugs as they do not necessarily manifest in an error or occur for example only after porting the application to a different HPC environment.

We introduce a debugging tool to support the detection of latent synchronization bugs. Based on the semantic flexibility of the MPI-3 specification we dynamically modify executions of improperly synchronized MPI remote memory accesses to force a manifestation of an error. An experimental evaluation with small applications and the usage in a library which heavily relies on MPI RMA reveal that this approach can uncover synchronization bugs which would otherwise likely go unnoticed.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://github.com/dash-project/nasty-MPI.

References

  1. Chen, Z., Dinan, J., Tang, Z., Balaji, P., Zhong, H., Wei, J., Huang, T., Qin, F.: MC-Checker: detecting memory consistency errors in MPI one-sided applications. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 499–510. IEEE Press, Piscataway (2014)

    Google Scholar 

  2. Dan, A.M., Lam, P., Hoefler, T., Vechev, M.: Modeling and analysis of remote memory access programming. In: Proceedings of the ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications, Amsterdam, pp. 129–144 (2016)

    Google Scholar 

  3. Faanes, G., Bataineh, A., Roweth, D., Court, T., Froese, E., Alverson, B., Johnson, T., Kopnick, J., Higgins, M., Reinhard, J.: Cray cascade: a scalable HPC system based on a dragonfly network. In: 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC), pp. 1–9. IEEE, Washington, DC (2012)

    Google Scholar 

  4. Fürlinger, K., Fuchs, T., Kowalewski, R.: DASH: a C++ PGAS library for distributed data structures and parallel algorithms. In: Proceedings of the 18th IEEE International Conference on High Performance Computing and Communications HPCC (2016)

    Google Scholar 

  5. Gropp, W., Thakur, R.: An evaluation of implementation options for MPI one-sided communication. In: Recent Advances in Parallel Virtual Machine and Message Passing Interface, pp. 415–424. Springer, Berlin (2005)

    Google Scholar 

  6. Hermanns, M.A., Miklosch, M., Böhme, D., Wolf, F.: Understanding the formation of wait states in applications with one-sided communication. In: Proceedings of the 20th European MPI Users’ Group Meeting, pp. 73–78. ACM, New York (2013)

    Google Scholar 

  7. Hilbrich, T., Protze, J., Schulz, M., de Supinski, B.R., Müller, M.S.: MPI runtime error detection with MUST: advances in deadlock detection. In: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, SC ’12, pp. 30:1–30:11. IEEE Computer Society Press, Los Alamitos, CA (2012)

    Google Scholar 

  8. Hoefler, T., Dinan, J., Thakur, R., Barrett, B., Balaji, P., Gropp, W., Underwood, K.: Remote memory access programming in MPI-3. ACM Trans. Parallel Comput. 2 (2), 9:1–9:26 (2015). doi:10.1145/2780584

    Google Scholar 

  9. Infiniband Trade Association: InfiniBand Architecture Specification Volume 2. https://cw.infinibandta.org/document/dl/7155 (2006)

  10. Kowalewski, R., Fürlinger, K.: Nasty-MPI: Debugging Synchronization Errors in MPI-3 One-Sided Applications. Lecture Notes in Computer Science, pp. 51–62. Springer, Cham (2016). doi:10.1007/978-3-319-43659-3_4. http://dx.doi.org/10.1007/978-3-319-43659-3_4

  11. Lamport, L.: Time, clocks, and the ordering of events in a distributed system. Commun. ACM 21 (7), 558–565 (1978). doi:10.1145/359545.359563

    Article  MATH  Google Scholar 

  12. Leibniz Supercomputing Centre, Munich, Germany: SuperMUC Petascale System. https://www.lrz.de/services/compute/supermuc/systemdescription/. Last accessed 2016

  13. Luecke, G.R., Spanoyannis, S., Kraeva, M.: The performance and scalability of SHMEM and MPI-2 one-sided routines on a SGI origin 2000 and a Cray T3E-600: performances. Concurr. Comput. Pract. Exper. 16 (10), 1037–1060 (2004). doi:10.1002/cpe.v16:10

    Article  Google Scholar 

  14. Mellor-Crummey, J.M., Scott, M.L.: Algorithms for scalable synchronization on shared-memory multiprocessors. ACM Trans. Comput. Syst. 9 (1), 21–65 (1991). doi:10.1145/103727.103729

    Article  Google Scholar 

  15. MPI Forum: MPI: A Message-Passing Interface Standard. Version 3.0 (2012). Available at: http://www.mpi-forum.org

  16. National Energy Research Center, United States: Edison System Configuration. https://www.nersc.gov/users/computational-systems/edison/configuration/. Last accessed 2016

  17. Park, C.S., Sen, K., Hargrove, P., Iancu, C.: Efficient data race detection for distributed memory parallel programs. In: Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis, SC ’11, pp. 51:1–51:12. ACM, New York (2011). doi:10.1145/2063384.2063452

    Google Scholar 

  18. Pervez, S., Gopalakrishnan, G., Kirby, R., Thakur, R., Gropp, W.: Formal verification of programs that use MPI one-sided communication. In: Mohr, B., Träff, J., Worringen, J., Dongarra, J. (eds.) Recent Advances in Parallel Virtual Machine and Message Passing Interface. Lecture Notes in Computer Science, vol. 4192, pp. 30–39. Springer, Berlin/ Heidelberg (2006). doi:10.1007/11846802_13

    Google Scholar 

Download references

Acknowledgements

We gratefully acknowledge funding by the German Research Foundation (DFG) through the German Priority Programme 1648 Software for Exascale Computing (SPPEXA). We further want to inform that this work is an extended revision from an originally published paper [10].

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Roger Kowalewski .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Kowalewski, R., Fürlinger, K. (2017). Debugging Latent Synchronization Errors in MPI-3 One-Sided Communication. In: Niethammer, C., Gracia, J., Hilbrich, T., Knüpfer, A., Resch, M., Nagel, W. (eds) Tools for High Performance Computing 2016. Springer, Cham. https://doi.org/10.1007/978-3-319-56702-0_5

Download citation

Publish with us

Policies and ethics