A distributed algorithm for detecting communication deadlocks
A distributed system is an interconnected network of computing elements or nodes, each of which has its own storage. A distributed program is a collection of processes. Processes execute asynchronously, possibly in different nodes of a distributed system, and they communicate with each other in order to realize a common goal. In such an environment, a group of processes may sometimes get involved in a communication deadlock. This is a situation in which each member process of the group is waiting for some member process to communicate with it, but no member is attempting communication with it. In this paper, we present an algorithm for detecting such communication deadlocks. The algorithm is distributed, i.e., processes detect deadlocks during the course of their communication transactions, without the aid of a central controller. The detection scheme does not assume any a priori structure among processes, and detection is made "on the fly" without freezing normal activities. The proposed scheme is appropriate to be implemented within runtime support or kernel of distributed programming languages.
KeywordsHamiltonian Cycle Control Message Central Controller Quiet State Communicate Sequential Process
Unable to display preview. Download preview PDF.
- 1.K.M. Chandy, and J. Misra, Asynchronous Distributed Simulation via a Sequence of parallel Computations, Communications of ACM, Vol.24, No.4, April 1981, pp 198–206.Google Scholar
- 2.K.M. Chandy, J. Misra, and L.M. Hass, Distributed Deadlock Detection, ACM Transactions on Computer Systems, Vol.1, No.2, May 1983, pp 144–156.Google Scholar
- 3.E.W. Dijkstra, and C.S. Scholten, Termination Detection for Diffusing Computations, Information Processing Letters, Vol.11, No.1, August 1980, pp 1–4.Google Scholar
- 4.N. Francez, Distributed Termination, ACM Transactions on Programming Languages and Systems, Vol.2, No.1, January 1980, pp 42–55.Google Scholar
- 5.C.A.R. Hoare, Communicating Sequential Processes, Communications of ACM, Vol.21, No.8, August 1978, pp 666–677.Google Scholar
- 6.L. Lamport, Time Clocks, and the Ordering of Events in a Distributed System, Communications of ACM, Vol.21, No.7, July 1978, pp 558–565.Google Scholar
- 7.J. Misra, and K.M. Chandy, Termination Detection of Diffusing Computations in Communicating Sequential Processes, ACM Transactions on Programming Languages and Systems, Vol.4, No.1, January 1982, pp 37–43.Google Scholar
- 8.N. Natarajan, Communication and Synchronization in Distributed Programs, Ph.D. thesis, University of Bombay, November 1983.Google Scholar
- 9.S.P. Rana, A Distributed Solution to the Distributed Termination Problem, Information Processing Letters, Vol.17, No.1, July 1983, pp 43–46.Google Scholar