Performance Enhancement by Means of Task Replication
In order for systems in which tasks may fail to be fault-tolerant, traditional methods deploy multiple servers as replicas to perform the same task. Further, in real time systems, computations have to meet strict time-constraints, a delayed output being unacceptable, even if correct. The effectiveness of sending task-replicas to multiple servers simultaneously, and using the results from whichever one responds first, is considered in this paper as a means of reducing response time and improving fault-tolerance. Once a request completes execution in one server successfully, it immediately cancels (kills) its replicas that remain at other servers. We assume a Markovian system and use the generating function method to determine the Laplace transform of the response time probability distribution, jointly with the probability that not all replicas fail, in the case of two replicas. When the failure rate of each task is greater than the service rate of the server, we make the approximation that the queues are independent, each with geometric queue length probability distributions at equilibrium. We compare our approximation with simulation results as well as with the exact solution in a truncated state space and find that for failure rates in that region, the approximation is generally good. At lower failure rates, the method of spectral expansion provides an excellent approximation in a truncated, multi-mode, two-dimensional Markov process.
KeywordsFault-tolerance reliability response time
Unable to display preview. Download preview PDF.
- 2.Chan, P., Lyu, M.R., Malek, M.: Reliableweb services: Methodology, experiment and modeling. In: IEEE International Conference on Web Services, ICWS 2007, pp. 679–686. IEEE (2007)Google Scholar
- 5.Gelenbe, E.: Product-form queueing networks with negative and positive customers. Journal of Applied Probability, 656–663 (1991)Google Scholar
- 6.Harrison, P.G., Pitel, E.: Sojourn times in single-server queues with negative customers. Journal of Applied Probability, 943–963 (1993)Google Scholar
- 7.Koren, I., Krishna, C.M.: Fault-tolerant systems. Morgan Kaufmann (2010)Google Scholar
- 10.Sauro, J.: The high cost of task failure on websites (2012), http://www.measuringusability.com/blog/cost-task-failure.php
- 11.Tang, C., Li, Q., Hua, B., Liu, A.: Developing reliable web services using independent replicas. In: Fifth International Conference on Semantics, Knowledge and Grid, SKG 2009, pp. 330–333. IEEE (2009)Google Scholar