Skip to main content

A Survey of Stability Results for Redundancy Systems

Part of the Emergence, Complexity and Computation book series (ECC,volume 41)

Abstract

Redundancy mechanisms consist in sending several copies of a same job to a subset of servers. It constitutes one of the most promising ways to exploit diversity in multi-servers applications. However, its pros and cons are still not sufficiently understood in the context of realistic models with generic statistical properties of service-times distributions and correlation structures of copies. We aim at giving a survey of recent results concerning the stability - arguably the first benchmark of performance - of systems with cancel-on-completion redundancy. We also point out open questions and conjectures.

Keywords

  • Redundancy
  • Load balancing
  • Stability

AMS(2020) Subject Classification:

  • Primary 60K25
  • Secondary 68M20

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-030-76928-4_13
  • Chapter length: 18 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   219.00
Price excludes VAT (USA)
  • ISBN: 978-3-030-76928-4
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   279.99
Price excludes VAT (USA)
Hardcover Book
USD   279.99
Price excludes VAT (USA)
Fig. 1.
Fig. 2.
Fig. 3.

Notes

  1. 1.

    When \(d=K-1\), there are d servers that process copies of one job, and the remaining \(K-d=1\) server serves one additional job, hence, \(\bar{\ell }=2\). When instead \(d=1\), there is no redundancy and each server serves one job in the saturated system, i.e., \(\bar{\ell }=K\). When \(d=K\), the system behaves as a single server with capacity \(\mu \), that is, \(\bar{\ell }=1\).

  2. 2.

    X is said to be New-Better-than-Used (NBU) if for all \(t_1,t_2\in \mathbb R\), \( \bar{F}_X (t_1+t_2) \le \bar{F}_X(t_1)\bar{F}_X(t_2).\) X is said to be New-Worse-than-Used (NWU) if for all \(t_1,t_2\in \mathbb R\), \(\bar{F}_X (t_1+t_2) \ge \bar{F}_X(t_1)\bar{F}_X(t_2).\) A sufficient condition for X to be NBU (NWU) is to have an increasing (a decreasing) hazard rate, i.e., r(x) is increasing (decreasing) in x.

References

  1. Akgun, O., Righter, R., Wolff, R.: Partial flexibility in routing and scheduling. Adv. Appl. Probab. 45(3), 673–691 (2013)

    MathSciNet  CrossRef  Google Scholar 

  2. Ananthanarayanan, G., Ghodsi, A., Shenker, S., Stoica, I.: Why let resources idle? Aggressive cloning of jobs with dolly. In: Proceedings of the 4th USENIX Conference on Hot Topics in Cloud Computing, HotCloud’ 12, Article 17, p. 6 (2012)

    Google Scholar 

  3. Ananthanarayanan, G., Ghodsi, A., Shenker, S., Stoica, I.: Effective straggler mitigation: attack of the clones. In: Proceedings of the 10th USENIX Conference on Networked Systems Design and Implementation vol. 13, pp. 185–198 (2013)

    Google Scholar 

  4. Anton, E., Ayesta, U., Jonckheere, M., Verloop, I.M.: On the stability of redundancy models. Oper. Res. (2021). https://doi.org/10.1287/opre.2020.2030

    CrossRef  Google Scholar 

  5. Anton, E., Ayesta, U., Jonckheere, M., Verloop, I.M.: Improving the performance of heterogeneous data centers through redundancy. In: Proceedings of the ACM on Measurement and Analysis of Computing Systems – SIGMETRICS 4(3), Article 48, p. 29 (2020)

    Google Scholar 

  6. Ayesta, U., Bodas, T., Dorsman, J., Verloop, I.M.: A token-based central queue with order-independent service rates. Oper. Res., to appear (2021)

    Google Scholar 

  7. Ayesta, U., Bodas, T., Verloop, I.M.: On a unifying product form framework for redundancy models. Perform. Eval. 127–128, 93–119 (2018)

    CrossRef  Google Scholar 

  8. Bonald, T., Comte, C.: Balanced fair resource sharing in computer clusters. Perform. Eval. 116, 70–83 (2017)

    CrossRef  Google Scholar 

  9. Cardinaels, E., Borst, S.C., van Leeuwaarden, J.S.H.: Redundancy scheduling with locally stable compatibility graphs. arXiv:2005.14566 (2020)

  10. Comte, C., Dorsman, J.: Pass-and-swap queues. arXiv:2009.12299 (2020)

  11. Dean, J.: Achieving rapid response times in large online services. Google Research (2012). http://research.google.com/people/jeff/latency.html

  12. Dean, J., Barroso, L.A.: The tail at scale. Commun. ACM 56, 74–80 (2013)

    CrossRef  Google Scholar 

  13. Duffy, K.R., Shneer, S.: MDS coding is better than replication for job completion times. arXiv:1907.11052 (2019)

  14. Foss, S., Korshunov, D., Zachary, S.: An Introduction to Heavy-Tailed and Subexponential Distributions. Springer, NY (2013)

    CrossRef  Google Scholar 

  15. Gardner, K., Harchol-Balter, M., Hyytia, E., Righter, R.: Scheduling for efficiency and fairness in systems with redundancy. Perform. Eval. 116, 1–25 (2017)

    CrossRef  Google Scholar 

  16. Gardner, K., Harchol-Balter, M., Scheller-Wolf, A., van Houdt, B.: A better model for job redundancy: decoupling server slowdown and job size. IEEE ACM Trans. Netw. 25(6), 3353–3367 (2017)

    CrossRef  Google Scholar 

  17. Gardner, K., Harchol-Balter, M., Scheller-Wolf, A., Velednitsky, M., Zbarsky, S.: Redundancy-d: the power of d choices for redundancy. Oper. Res. 65, 1078–1094 (2017)

    MathSciNet  CrossRef  Google Scholar 

  18. Gardner, K., Hyytiä, E., Righter, R.: A little redundancy goes a long way: convexity in redundancy systems. Perform. Eval. 131, 22–42 (2019)

    CrossRef  Google Scholar 

  19. Gardner, K., Righter, R.: Product forms for FCFS queueing models with arbitrary server-job compatibilities: an overview. Queueing Syst. 96(1), 3–51 (2020)

    MathSciNet  CrossRef  Google Scholar 

  20. Gardner, K., Zbarsky, S., Doroudi, S., Harchol-Balter, M., Hyytiä, E., Scheller-Wolf, A.: Queueing with redundant requests: exact analysis. Queueing Syst. 83(3–4), 227–259 (2016)

    MathSciNet  CrossRef  Google Scholar 

  21. Harchol-Balter, M.: Performance Modeling and Design of Computer Systems: Queueing Theory in Action. Cambridge University Press, NY (2013)

    MATH  Google Scholar 

  22. Hellemans, T., Bodas, T., van Houdt, B.: Performance analysis of workload dependent load balancing policies. In: International Conference on Measurement and Modeling of Computer Systems vol. 3(2), Article 35, p. 35 (2019)

    Google Scholar 

  23. Hellemans, T., van Houdt, B.: On the Power-of-d-choices with least loaded server selection. In: Proceedings of the ACM on Measurement and Analysis of Computing Systems – SIGMETRICS vol. 2(2), Article 27, p. 22 (2018)

    Google Scholar 

  24. Hellemans, T., van Houdt, B.: Analysis of redundancy(d) with identical replicas. ACM Sigmetrics Perform. Eval. Rev. 46(3), 74–79 (2018)

    CrossRef  Google Scholar 

  25. Hellemans, T., van Houdt, B.: Performance of redundancy(d) with identical/independent replicas. In: ACM Transaction on Modeling and Performance Evaluation of Computing Systems (TOMPECS), vol. 4(2), Article 9, p. 28 (2019)

    Google Scholar 

  26. Joshi, G., Soljanin, E., Wornell, G.: Efficient redundancy techniques for latency reduction in cloud systems. In: ACM Transaction on Modeling and Performance Evaluation of Computing Systems (TOMPECS), vol. 2(2), Article 12, p. 30 (2017)

    Google Scholar 

  27. Koole, G., Righter, R.: Resource allocation in grid computing. J. Sched. 11, 163–173 (2007)

    CrossRef  Google Scholar 

  28. Krzesinski, A.E.: Order independent queues. In: Boucherie, R.J., van Dijk, N.M. (eds.) Queueing Networks: a Fundamental Approach, pp. 85–120. Springer, Boston, MA (2011)

    CrossRef  Google Scholar 

  29. Lee, K., Shah, N.B., Huang, L., Ramchandran, K.: When do redundant requests reduce latency? IEEE Trans. Commun. 64(2), 715–722 (2016)

    CrossRef  Google Scholar 

  30. Lee, K., Shah, N.B., Huang, L., Ramchandran, K.: The mds queue: analysing the latency performance of erasure codes. IEEE Trans. Inf. Theory 63(5), 2822–2842 (2017)

    MathSciNet  MATH  Google Scholar 

  31. Li, B., Ramamoorthy, A., Srikant, R.: Mean-field-analysis of coding versus replication in cloud storage systems. In: IEEE INFOCOM 2016 - The 35th Annual IEEE International Conference on Computer Communications, pp. 1–9 (2016)

    Google Scholar 

  32. Mendelson, G.: A lower bound on the stability region of redundancy-d with FIFO service discipline. Oper. Res. Lett. 49(1), 113–120 (2021)

    MathSciNet  CrossRef  Google Scholar 

  33. Paganini, F., Tang, A., Ferragut, A., Andrew, L.: Network stability under alpha fair bandwidth allocation with general file size distribution. IEEE Trans. Automat. Contr. 57, 579–591 (2012)

    MathSciNet  CrossRef  Google Scholar 

  34. Raaijmakers, Y., Borst, S.C.: Achievable stability in redundancy systems. In: Proceedings of the ACM on Measurement and Analysis of Computing Systems – SIGMETRICS, vol. 4(3), Article 46, p. 21 (2020)

    Google Scholar 

  35. Raaijmakers, Y., Borst, S.C., Boxma, O.: Redundancy scheduling with scaled Bernoulli service requirements. Queueing Syst. 93, 67–82 (2019)

    MathSciNet  CrossRef  Google Scholar 

  36. Raaijmakers, Y., Borst, S.C., Boxma, O.: Stability of redundancy systems with processor sharing. In: Proceedings of the 13th EAI International Conference on Performance Evaluation Methodologies and Tools, Valuetools 20, pp. 120–127 (2020)

    Google Scholar 

  37. Ross, S.M.: Stochastic Processes. Wiley & Sons, NY (1996)

    MATH  Google Scholar 

  38. Sieber, C., Blenk, A., Hinteregger, M., Kellerer, W.: The cost of aggressive http adaptive streaming: quantifying youtube’s redundant traffic. In: 2015 IFIP/IEEE Intern. Symp. on Integrated Network Management (IM), pp. 1261–1267 (2015)

    Google Scholar 

  39. Sun, Y., Koksal, C.E., Shroff, N.B.: On delay-optimal scheduling in queueing systems with replications. arXiv:1603.07322 (2016)

  40. Visschers, J., Adan, I., Weiss, G.: A product form solution to a system with multi-type jobs and multi-type servers. Queueing Syst. 70, 269–298 (2012)

    MathSciNet  CrossRef  Google Scholar 

  41. Vulimiri, A., Godfrey, P.B., Mittal, R., Sherry, J., Ratnasamy, S., Shenker, S.: Low latency via redundancy. In: Proceedings of the ACM Conference on Emerging Networking Experiments and Technologies, pp. 283–294 (2013)

    Google Scholar 

  42. Vulimiri, A., Michel, O., Godfrey, P.V., Shenker, S.: More is less: reducing latency via redundancy. In: Proceedings of the 11th ACM Workshop on Hot Topics in Networks, HotNets’11, vol. 11, pp. 13–18 (2012)

    Google Scholar 

  43. Zubeldia, M.: Delay-optimal policies in partial fork-join systems with redundancy and random slowdowns. In: Proceedings of the ACM on Measurement and Analysis of Computing Systems – SIGMETRICS, vol. 4(1), Article 2, p. 49 (2020)

    Google Scholar 

Download references

Acknowledgement

Research of E. Anton supported and research of M. Jonckheere partially supported by the French “Agence Nationale de la Recherche (ANR)” through the project ANR-15-CE25-0004 (ANR JCJC RACON). U. Ayesta has received funding from the Department of Education of the Basque Government through the Consolidated Research Group MATHMODE (IT1294-19).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Elene Anton .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2021 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Verify currency and authenticity via CrossMark

Cite this paper

Anton, E., Ayesta, U., Jonckheere, M., Verloop, I.M. (2021). A Survey of Stability Results for Redundancy Systems. In: Piunovskiy, A., Zhang, Y. (eds) Modern Trends in Controlled Stochastic Processes:. Emergence, Complexity and Computation, vol 41. Springer, Cham. https://doi.org/10.1007/978-3-030-76928-4_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-76928-4_13

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-76927-7

  • Online ISBN: 978-3-030-76928-4

  • eBook Packages: EngineeringEngineering (R0)