The Surprising Power of Epidemic Communication
The focus of this position paper is on the most appropriate form of middleware to offer in support of distributed system management, control, information sharing and multicast communication. Our premise is that technology has been deficient in all of these areas. If recent advances can be transitioned into general practice, this could enable a new generation of better distributed systems, with value in settings ranging from such “critical infrastructure” areas as air traffic control and control of the restructured electric power grid to emerging areas, such as large-scale sensor networks, data mining and data fusion. The middleware domain of interest to us has witnessed some three decades of debate between distributed computing systems with strong properties (such as virtual synchrony, fault-tolerance, security, or guaranteed consistency) and those with weak properties (typified by web browsers, but extending into the broader area of web services and network applications built from remote procedure call and using timeout for failure detection). It seems fair to say that neither has been completely satisfactory, and commercial platforms have yet to include either kind of technology in a standard, widely available form. Systems with stronger guarantees would be preferable to systems with weaker guarantees if the two categories were comparable in other dimensions (including performance, ease of use, programming support, configuration and management, runtime control, complexity of runtime environment, etc). However, the two classes differ in most of these respects, hence the question is more subtle. Systems offering stronger guarantees are very often slow, scale poorly, and require complex infrastructure. They have rarely been supported to the same degree as other technologies by commercial vendors. Programming tools are inferior or completely lacking, and integration with commercial platforms is poor.
KeywordsStrong Property Remote Procedure Call Multicast Communication Byzantine Agreement Electric Power Grid
Unable to display preview. Download preview PDF.
- 18.1Astrolabe: A Robust and Scalable Technology for Distributed System Monitoring, Management, and Data Mining. Robbert van Renesse, and Kenneth Birman. Submitted to ACM TOCS, November 2001Google Scholar
- 18.4Wide-Area Cooperative Storage with CFS. Frank Dabek, M. Frans Kaashoek, David Karger, Robert Morris (MIT), Ion Stoica (UC Berkeley)Google Scholar
- 18.5A Scalable Content-Addressable Network. S. Ratnasamy, P. Francis, M. Handley, R. Karp, and S. Schenker. Proc. ACM SIGCOMM, San Diego, CA, August 2001.Google Scholar
- 18.6Fighting Fire with Fire: Using Randomized Gossip to Combat Stochastic Reliability Limits. Indranil Gupta, Ken Birman, Robbert van Renesse. Quality and Reliability Engineering International, 18: 165–184 (Wiley; 2002)Google Scholar