Performance Enhancement by Means of Task Replication

  • Peter G. Harrison
  • Zhan Qiu
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8168)


In order for systems in which tasks may fail to be fault-tolerant, traditional methods deploy multiple servers as replicas to perform the same task. Further, in real time systems, computations have to meet strict time-constraints, a delayed output being unacceptable, even if correct. The effectiveness of sending task-replicas to multiple servers simultaneously, and using the results from whichever one responds first, is considered in this paper as a means of reducing response time and improving fault-tolerance. Once a request completes execution in one server successfully, it immediately cancels (kills) its replicas that remain at other servers. We assume a Markovian system and use the generating function method to determine the Laplace transform of the response time probability distribution, jointly with the probability that not all replicas fail, in the case of two replicas. When the failure rate of each task is greater than the service rate of the server, we make the approximation that the queues are independent, each with geometric queue length probability distributions at equilibrium. We compare our approximation with simulation results as well as with the exact solution in a truncated state space and find that for failure rates in that region, the approximation is generally good. At lower failure rates, the method of spectral expansion provides an excellent approximation in a truncated, multi-mode, two-dimensional Markov process.


Fault-tolerance reliability response time 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Artalejo, J.R.: G-networks: A versatile approach for work removal in queueing networks. European Journal of Operational Research 126(2), 233–249 (2000)MathSciNetMATHCrossRefGoogle Scholar
  2. 2.
    Chan, P., Lyu, M.R., Malek, M.: Reliableweb services: Methodology, experiment and modeling. In: IEEE International Conference on Web Services, ICWS 2007, pp. 679–686. IEEE (2007)Google Scholar
  3. 3.
    Dabrowski, C.: Reliability in grid computing systems. Concurrency and Computation: Practice and Experience 21(8), 927–959 (2009)CrossRefGoogle Scholar
  4. 4.
    Dean, J., Barroso, L.A.: The tail at scale. Communications of the ACM 56(2), 74–80 (2013)CrossRefGoogle Scholar
  5. 5.
    Gelenbe, E.: Product-form queueing networks with negative and positive customers. Journal of Applied Probability, 656–663 (1991)Google Scholar
  6. 6.
    Harrison, P.G., Pitel, E.: Sojourn times in single-server queues with negative customers. Journal of Applied Probability, 943–963 (1993)Google Scholar
  7. 7.
    Koren, I., Krishna, C.M.: Fault-tolerant systems. Morgan Kaufmann (2010)Google Scholar
  8. 8.
    Maxion, R.A., Siewiorek, D.P., Elkind, S.A.: Techniques and architectures for fault-tolerant computing. Annual Review of Computer Science 2(1), 469–520 (1987)CrossRefGoogle Scholar
  9. 9.
    Mitrani, I.: Spectral expansion solutions for markov-modulated queues. In: Calzarossa, M.C., Tucci, S. (eds.) Performance 2002. LNCS, vol. 2459, pp. 17–35. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  10. 10.
    Sauro, J.: The high cost of task failure on websites (2012),
  11. 11.
    Tang, C., Li, Q., Hua, B., Liu, A.: Developing reliable web services using independent replicas. In: Fifth International Conference on Semantics, Knowledge and Grid, SKG 2009, pp. 330–333. IEEE (2009)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Peter G. Harrison
    • 1
  • Zhan Qiu
    • 1
  1. 1.Department of ComputingImperial College LondonLondonUK

Personalised recommendations