Abstract
Through an extensive experimental analysis of over 900 possible configurations of a fault-tolerant middleware system, we present empirical evidence that the unpredictability inherent in such systems arises from merely 1% of the remote invocations. The occurrence of very high latencies cannot be regulated through parameters such as the number of clients, the replication style and degree or the request rates. However, by selectively filtering out a “magical 1%” of the raw observations of various metrics, we show that performance, in terms of measured end-to-end latency and throughput, can be bounded, easy to understand and control. This simple statistical technique enables us to guarantee, with some level of confidence, bounds for percentile-based quality of service (QoS) metrics, which dramatically increase our ability to tune and control a middleware system in a predictable manner.
This work has been partially supported by the NSF CAREER grant CCR-0238381, the DARPA PCES contract F33615-03-C-4110, and also in part by the General Motors Collaborative Research Laboratory at Carnegie Mellon University.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Krishna, A.S., Wang, N., Natarajan, B., Gokhale, A., Schmidt, D.C., Thaker, G.: CCMPerf: A benchmarking tool for CORBA Component Model implementations. The International Journal of Time-Critical Computing Systems 29 (2005)
Zhao, W., Moser, L., Melliar-Smith, P.: End-to-end latency of a fault-tolerant CORBA infrastructure. In: Object-Oriented Real-Time Distributed Computing, Washington, DC, pp. 189–198 (2002)
Dumitraş, T., Srivastava, D., Narasimhan, P.: Architecting and implementing versatile dependability. In: de Lemos, R., Gacek, C., Romanovsky, A., et al. (eds.) Architecting Dependable Systems III. LNCS, vol. 3549, pp. 212–231. Springer, Heidelberg (2005)
Croll, A.: Meaningful Service Level Agreements for Web transaction systems. LOOP: The Online Voice of the IT Community (2005)
White, B., et al.: An integrated experimental environment for distributed systems and networks. In: Symposium on Operating Systems Design and Implementation, Boston, MA, pp. 255–270 (2002)
Narasimhan, P., Dumitraş, T., Paulos, A., Pertet, S., Reverte, C., Slember, J., Srivastava, D.: MEAD: Support for real-time, fault-tolerant CORBA. Concurrency and Computation: Practice and Experience 17, 1527–1545 (2005)
Amir, Y., Danilov, C., Stanton, J.: A low latency, loss tolerant architecture and protocol for wide area group communication. In: International Conference on Dependable Systems and Networks, New York, pp. 327–336 (2000)
Schmidt, D.C., Levine, D.L., Mungee, S.: The design of the TAO real-time Object Request Broker. Computer Communications 21, 294–324 (1998)
Siewiorek, D., Swarz, R.: Reliable Computer Systems, 2nd edn. Digital Press (1992)
Object Management Group: Fault Tolerant CORBA. OMG Technical Committee Document formal/2001-09-29 (2001)
Felber, P., Narasimhan, P.: Experiences, approaches and challenges in building fault-tolerant CORBA systems. IEEE Transactions on Computers 54, 497–511 (2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 IFIP International Federation for Information Processing
About this paper
Cite this paper
Dumitraş, T., Narasimhan, P. (2005). Fault-Tolerant Middleware and the Magical 1%. In: Alonso, G. (eds) Middleware 2005. Middleware 2005. Lecture Notes in Computer Science, vol 3790. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11587552_24
Download citation
DOI: https://doi.org/10.1007/11587552_24
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-30323-7
Online ISBN: 978-3-540-32269-6
eBook Packages: Computer ScienceComputer Science (R0)