Formal and experimental validation of a low overhead execution replay mechanism
This paper presents a mechanism for record-replay of parallel programs written in a remote procedure call (RPC) based parallel programming model. This mechanism, which will serve as a basis for implementing a user-level debugger, exploits some properties of the programming model to limit drastically the number of records that need to be done. A formal proof of the equivalence between recorded and replayed executions is given. Systematic measurements of the time overhead of the recording indicate that it is sufficiently low for the recording mode to be considered as normal execution mode. Similar techniques can be applied to other programming models.
KeywordsInstant Replay parallel debugging deterministic reexecutions Remote Procedure Call
Unable to display preview. Download preview PDF.
- 1.P. Bouvry, J. Chassin, and D. Trystram. Efficient solutions for mapping parallel programs. In Proceedings of EuroPar'95. Springer-Verlag, August 1995.Google Scholar
- 2.M. Christaller. Athapascan-0a control parallelism approach on top of PVM. In Proc PVM User's group meeting. University of Tennessee, Oak Ridge, 1994.Google Scholar
- 3.H. Jamrozik. Aide à la Mise au Point des Applications Parallèles et Réparties à base d'Objets Persistants. PhD thesis, Université Joseph Fourier, Grenoble, 1993.Google Scholar
- 4.J. P. Kitajima and B. Plateau. Modelling parallel program behaviour in ALPES. Information and Software Technology, 36(7):457–464, July 1994.Google Scholar
- 5.T.J. LeBlanc and J.M. Mellor-Crummey. Debugging Parallel Programs with Instant Replay. IEEE Transactions on Computers, C-36(4):471–481, 1987.Google Scholar
- 6.E. Leu and A. Schiper. Execution replay: a mechanism for integrating a visualization tool with a symbolic debugger. In CONPAR 92 — VAPP V, volume 634 of LNCS, September 1992.Google Scholar
- 7.F. Mattern. Virtual time and global states of distributed systems. In Proceedings of the Workshop on Parallel and Distributed Algorithms, Bonas, France, September 1988. North Holland.Google Scholar
- 8.J.M. Mellor-Crummey. Debugging and Analysis of Large-Scale Parallel Programs. Technical Report 312, University of Rochester, September 1989.Google Scholar
- 9.B. Plateau. Présentation d'APACHE. Rapport APACHE 1, IMAG, Grenoble, December 1994. Available at ftp.imag.fr:imag/APACHE/RAPPORTS.Google Scholar
- 10.V. Strassen. Gaussian Elimination is not Optimal. Numerische Mathematik, Band 13(Heft 4):354–356, 1969.Google Scholar
- 11.C. Tron et al. Performance Evaluation of Parallel Systems: the alpes environment. In Proceedings of ParCo93. Elsevier Science Publishers, 1993.Google Scholar