Abstract
This paper presents a novel deterministic MPI implementation (dMPI) to facilitate the debugging of MPI programs. Distinct from existing approaches, dMPI ensures inherent determinism without using any external support (e.g., logs), which achieves convenience and performance simultaneously. The basic idea of dMPI is to use deterministic logical time to solve message races and control asynchronous transmissions, thus we could eliminate the nondeterministic behaviors of the existing message passing mechanism. To avoid deadlocks introduced by dMPI, we also integrate dMPI with a lightweight deadlock checker to dynamically detect and solve these deadlocks. We have implemented dMPI and evaluated it using NPB benchmarks. The results show that dMPI could guarantee determinism with incurring modest overhead (8% on average).
Chapter PDF
Similar content being viewed by others
References
de Kergommeaux, J.C., Ronsse, M., De Bosschere, K.: MPL*: Efficient Record/Replay of Nondeterministic Features of Message Passing Libraries. In: Margalef, T., Dongarra, J., Luque, E. (eds.) PVM/MPI 1999. LNCS, vol. 1697, pp. 141–148. Springer, Heidelberg (1999)
Clémençon, C., Fritscher, J., Meehan, M., Rhl, R.: An Implementation of Race Detection and Deterministic Replay with MPI. In: Haridi, S., Ali, K., Magnusson, P. (eds.) Euro-Par 1995. LNCS, vol. 966, pp. 155–166. Springer, Heidelberg (1995)
Xue, R., Liu, X., Wu, M., Guo, Z., Chen, W., Zheng, W., Zhang, Z., Voelker, G.: MPIWiz: Subgroup reproducible replay of MPI applications. In: PPoPP, pp. 251–260 (2009)
Maruyama, M., Tsumura, T., Nakashima, H.: Parallel program debugging based on data-replay. In: PDCS, pp. 151–156 (2005)
Kranzlmüller, D., Schaubschläger, C., Volkert, J.: An Integrated Record&Replay Mechanism for Nondeterministic Message Passing Programs. In: Cotronis, Y., Dongarra, J. (eds.) PVM/MPI 2001. LNCS, vol. 2131, pp. 192–200. Springer, Heidelberg (2001)
Joseph, D., Brandon, L., Luis, C., Mark, O.: DMP: deterministic shared memory multiprocessing. In: Proceeding of the 14th International Conference on Architectural Support for Programming Languages and Operating Systems. ACM, Washington, DC (2009)
Bailey, D., Harris, T., Saphir, W., van der Wijngaart, R., Woo, A., Yarrow, M.: The NAS Parallel Benchmarks 2.0. Technical Report NAS-95-020, NASA Ames Research Center, Mail Stop T 27 A-1, Moffett Field, CA 94035- 1000, USA (December 05, 1995)
Bocchino Jr, R.L., Adve, V.S., Adve, S.V., Snir, M.: Parallel programming must be deterministic by default. In: Proceedings of the First USENIX Conference on Hot Topics in Parallelism, p. 4 (2009)
Luecke, G.R., Zou, Y., Coyle, J., Hoekstra, J., Kraeva, M.: Deadlocks detection in MPI programs. Concurrency and Computation: Practice and Experience 14, 911–932 (2002)
Fidge, C.J.: Partial orders for parallel debugging. In: ACM SIGPLAN/SIGOPS Workshop on Parallel and Distributed Debugging, vol. 24(1), pp. 183–194 (January 1989)
Lamport, L.: Time, clocks, and the ordering of events in a distributed system. Communications of the ACM 21, 558–565 (1978)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 IFIP International Federation for Information Processing
About this paper
Cite this paper
Zhou, X., Lu, K., Lu, X., Wang, X., Fan, B. (2012). dMPI: Facilitating Debugging of MPI Programs via Deterministic Message Passing. In: Park, J.J., Zomaya, A., Yeo, SS., Sahni, S. (eds) Network and Parallel Computing. NPC 2012. Lecture Notes in Computer Science, vol 7513. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35606-3_20
Download citation
DOI: https://doi.org/10.1007/978-3-642-35606-3_20
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-35605-6
Online ISBN: 978-3-642-35606-3
eBook Packages: Computer ScienceComputer Science (R0)