Skip to main content

Advertisement

SpringerLink
Log in
Menu
Find a journal Publish with us
Search
Cart
Book cover

IFIP International Conference on Network and Parallel Computing

NPC 2012: Network and Parallel Computing pp 172–179Cite as

  1. Home
  2. Network and Parallel Computing
  3. Conference paper
dMPI: Facilitating Debugging of MPI Programs via Deterministic Message Passing

dMPI: Facilitating Debugging of MPI Programs via Deterministic Message Passing

  • Xu Zhou20,
  • Kai Lu20,
  • Xicheng Lu20,
  • Xiaoping Wang20 &
  • …
  • Baohua Fan20 
  • Conference paper
  • 2319 Accesses

Part of the Lecture Notes in Computer Science book series (LNTCS,volume 7513)

Abstract

This paper presents a novel deterministic MPI implementation (dMPI) to facilitate the debugging of MPI programs. Distinct from existing approaches, dMPI ensures inherent determinism without using any external support (e.g., logs), which achieves convenience and performance simultaneously. The basic idea of dMPI is to use deterministic logical time to solve message races and control asynchronous transmissions, thus we could eliminate the nondeterministic behaviors of the existing message passing mechanism. To avoid deadlocks introduced by dMPI, we also integrate dMPI with a lightweight deadlock checker to dynamically detect and solve these deadlocks. We have implemented dMPI and evaluated it using NPB benchmarks. The results show that dMPI could guarantee determinism with incurring modest overhead (8% on average).

Download conference paper PDF

References

  1. de Kergommeaux, J.C., Ronsse, M., De Bosschere, K.: MPL*: Efficient Record/Replay of Nondeterministic Features of Message Passing Libraries. In: Margalef, T., Dongarra, J., Luque, E. (eds.) PVM/MPI 1999. LNCS, vol. 1697, pp. 141–148. Springer, Heidelberg (1999)

    CrossRef  Google Scholar 

  2. Clémençon, C., Fritscher, J., Meehan, M., Rhl, R.: An Implementation of Race Detection and Deterministic Replay with MPI. In: Haridi, S., Ali, K., Magnusson, P. (eds.) Euro-Par 1995. LNCS, vol. 966, pp. 155–166. Springer, Heidelberg (1995)

    CrossRef  Google Scholar 

  3. Xue, R., Liu, X., Wu, M., Guo, Z., Chen, W., Zheng, W., Zhang, Z., Voelker, G.: MPIWiz: Subgroup reproducible replay of MPI applications. In: PPoPP, pp. 251–260 (2009)

    Google Scholar 

  4. Maruyama, M., Tsumura, T., Nakashima, H.: Parallel program debugging based on data-replay. In: PDCS, pp. 151–156 (2005)

    Google Scholar 

  5. Kranzlmüller, D., Schaubschläger, C., Volkert, J.: An Integrated Record&Replay Mechanism for Nondeterministic Message Passing Programs. In: Cotronis, Y., Dongarra, J. (eds.) PVM/MPI 2001. LNCS, vol. 2131, pp. 192–200. Springer, Heidelberg (2001)

    CrossRef  Google Scholar 

  6. Joseph, D., Brandon, L., Luis, C., Mark, O.: DMP: deterministic shared memory multiprocessing. In: Proceeding of the 14th International Conference on Architectural Support for Programming Languages and Operating Systems. ACM, Washington, DC (2009)

    Google Scholar 

  7. MPICH2, http://www.mcs.anl.gov/research/projects/mpich2/

  8. Bailey, D., Harris, T., Saphir, W., van der Wijngaart, R., Woo, A., Yarrow, M.: The NAS Parallel Benchmarks 2.0. Technical Report NAS-95-020, NASA Ames Research Center, Mail Stop T 27 A-1, Moffett Field, CA 94035- 1000, USA (December 05, 1995)

    Google Scholar 

  9. Bocchino Jr, R.L., Adve, V.S., Adve, S.V., Snir, M.: Parallel programming must be deterministic by default. In: Proceedings of the First USENIX Conference on Hot Topics in Parallelism, p. 4 (2009)

    Google Scholar 

  10. Luecke, G.R., Zou, Y., Coyle, J., Hoekstra, J., Kraeva, M.: Deadlocks detection in MPI programs. Concurrency and Computation: Practice and Experience 14, 911–932 (2002)

    CrossRef  MATH  Google Scholar 

  11. Fidge, C.J.: Partial orders for parallel debugging. In: ACM SIGPLAN/SIGOPS Workshop on Parallel and Distributed Debugging, vol. 24(1), pp. 183–194 (January 1989)

    Google Scholar 

  12. Lamport, L.: Time, clocks, and the ordering of events in a distributed system. Communications of the ACM 21, 558–565 (1978)

    CrossRef  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

  1. School of Computer, National University of Defense Technology, Changsha, Hunan, China, 410073

    Xu Zhou, Kai Lu, Xicheng Lu, Xiaoping Wang & Baohua Fan

Authors
  1. Xu Zhou
    View author publications

    You can also search for this author in PubMed Google Scholar

  2. Kai Lu
    View author publications

    You can also search for this author in PubMed Google Scholar

  3. Xicheng Lu
    View author publications

    You can also search for this author in PubMed Google Scholar

  4. Xiaoping Wang
    View author publications

    You can also search for this author in PubMed Google Scholar

  5. Baohua Fan
    View author publications

    You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

  1. Department of Computer Science and Engineering, SeoulTech, 172 Gongreung 2-dong, Nowon-gu, 139-743, Seoul, Korea

    James J. Park

  2. School of Information Technologies, The University of Sydney, Building J12, 2006, Sydney, NSW, Australia

    Albert Zomaya

  3. Division of Computer Engineering, Mokwon University, 88 Do-An-Buk-Ro, Seo-gu, 302-729, Daejeon, Korea

    Sang-Soo Yeo

  4. Department of Computer and Information Science and Engineering, University of Florida, CSE 301, 32611, Gainesville, FL, USA

    Sartaj Sahni

Rights and permissions

Reprints and Permissions

Copyright information

© 2012 IFIP International Federation for Information Processing

About this paper

Cite this paper

Zhou, X., Lu, K., Lu, X., Wang, X., Fan, B. (2012). dMPI: Facilitating Debugging of MPI Programs via Deterministic Message Passing. In: Park, J.J., Zomaya, A., Yeo, SS., Sahni, S. (eds) Network and Parallel Computing. NPC 2012. Lecture Notes in Computer Science, vol 7513. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35606-3_20

Download citation

  • .RIS
  • .ENW
  • .BIB
  • DOI: https://doi.org/10.1007/978-3-642-35606-3_20

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-35605-6

  • Online ISBN: 978-3-642-35606-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Share this paper

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Search

Navigation

  • Find a journal
  • Publish with us

Discover content

  • Journals A-Z
  • Books A-Z

Publish with us

  • Publish your research
  • Open access publishing

Products and services

  • Our products
  • Librarians
  • Societies
  • Partners and advertisers

Our imprints

  • Springer
  • Nature Portfolio
  • BMC
  • Palgrave Macmillan
  • Apress
  • Your US state privacy rights
  • Accessibility statement
  • Terms and conditions
  • Privacy policy
  • Help and support

167.114.118.210

Not affiliated

Springer Nature

© 2023 Springer Nature