Fault-Tolerant Middleware and the Magical 1%

Dumitraş, Tudor; Narasimhan, Priya

doi:10.1007/11587552_24

Tudor Dumitraş¹⁷ &
Priya Narasimhan¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 3790))

Included in the following conference series:

ACM/IFIP/USENIX International Conference on Distributed Systems Platforms and Open Distributed Processing

703 Accesses
3 Citations

Abstract

Through an extensive experimental analysis of over 900 possible configurations of a fault-tolerant middleware system, we present empirical evidence that the unpredictability inherent in such systems arises from merely 1% of the remote invocations. The occurrence of very high latencies cannot be regulated through parameters such as the number of clients, the replication style and degree or the request rates. However, by selectively filtering out a “magical 1%” of the raw observations of various metrics, we show that performance, in terms of measured end-to-end latency and throughput, can be bounded, easy to understand and control. This simple statistical technique enables us to guarantee, with some level of confidence, bounds for percentile-based quality of service (QoS) metrics, which dramatically increase our ability to tune and control a middleware system in a predictable manner.

This work has been partially supported by the NSF CAREER grant CCR-0238381, the DARPA PCES contract F33615-03-C-4110, and also in part by the General Motors Collaborative Research Laboratory at Carnegie Mellon University.

Download to read the full chapter text

Chapter PDF

Supr: Adaptive Byzantine Fault-Tolerant Replication

The Impact of Consistency on System Latency in Fault Tolerant Internet Computing

Kollaps/Thunderstorm: Reproducible Evaluation of Distributed Systems

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Krishna, A.S., Wang, N., Natarajan, B., Gokhale, A., Schmidt, D.C., Thaker, G.: CCMPerf: A benchmarking tool for CORBA Component Model implementations. The International Journal of Time-Critical Computing Systems 29 (2005)
Google Scholar
Zhao, W., Moser, L., Melliar-Smith, P.: End-to-end latency of a fault-tolerant CORBA infrastructure. In: Object-Oriented Real-Time Distributed Computing, Washington, DC, pp. 189–198 (2002)
Google Scholar
http://www.atl.external.lmco.com/projects/QoS/
Dumitraş, T., Srivastava, D., Narasimhan, P.: Architecting and implementing versatile dependability. In: de Lemos, R., Gacek, C., Romanovsky, A., et al. (eds.) Architecting Dependable Systems III. LNCS, vol. 3549, pp. 212–231. Springer, Heidelberg (2005)
Chapter Google Scholar
Croll, A.: Meaningful Service Level Agreements for Web transaction systems. LOOP: The Online Voice of the IT Community (2005)
Google Scholar
White, B., et al.: An integrated experimental environment for distributed systems and networks. In: Symposium on Operating Systems Design and Implementation, Boston, MA, pp. 255–270 (2002)
Google Scholar
Narasimhan, P., Dumitraş, T., Paulos, A., Pertet, S., Reverte, C., Slember, J., Srivastava, D.: MEAD: Support for real-time, fault-tolerant CORBA. Concurrency and Computation: Practice and Experience 17, 1527–1545 (2005)
Article Google Scholar
Amir, Y., Danilov, C., Stanton, J.: A low latency, loss tolerant architecture and protocol for wide area group communication. In: International Conference on Dependable Systems and Networks, New York, pp. 327–336 (2000)
Google Scholar
Schmidt, D.C., Levine, D.L., Mungee, S.: The design of the TAO real-time Object Request Broker. Computer Communications 21, 294–324 (1998)
Article Google Scholar
Siewiorek, D., Swarz, R.: Reliable Computer Systems, 2nd edn. Digital Press (1992)
Google Scholar
Object Management Group: Fault Tolerant CORBA. OMG Technical Committee Document formal/2001-09-29 (2001)
Google Scholar
Felber, P., Narasimhan, P.: Experiences, approaches and challenges in building fault-tolerant CORBA systems. IEEE Transactions on Computers 54, 497–511 (2004)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Carnegie Mellon University, Pittsburgh, PA, 15213, USA
Tudor Dumitraş & Priya Narasimhan

Authors

Tudor Dumitraş
View author publications
You can also search for this author in PubMed Google Scholar
Priya Narasimhan
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Systems Group Department of Computer Science, ETH Zurich, Switzerland
Gustavo Alonso

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Dumitraş, T., Narasimhan, P. (2005). Fault-Tolerant Middleware and the Magical 1%. In: Alonso, G. (eds) Middleware 2005. Middleware 2005. Lecture Notes in Computer Science, vol 3790. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11587552_24

Download citation

DOI: https://doi.org/10.1007/11587552_24
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-30323-7
Online ISBN: 978-3-540-32269-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Fault-Tolerant Middleware and the Magical 1%

Abstract

Chapter PDF

Similar content being viewed by others

Supr: Adaptive Byzantine Fault-Tolerant Replication

The Impact of Consistency on System Latency in Fault Tolerant Internet Computing

Kollaps/Thunderstorm: Reproducible Evaluation of Distributed Systems

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Fault-Tolerant Middleware and the Magical 1%

Abstract

Chapter PDF

Similar content being viewed by others

Supr: Adaptive Byzantine Fault-Tolerant Replication

The Impact of Consistency on System Latency in Fault Tolerant Internet Computing

Kollaps/Thunderstorm: Reproducible Evaluation of Distributed Systems

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation