A Survey of Stability Results for Redundancy Systems

Anton, Elene; Ayesta, Urtzi; Jonckheere, Matthieu; Verloop, Ina Maria

doi:10.1007/978-3-030-76928-4_13

Elene Anton^25,27,
Urtzi Ayesta^25,26,27,28,
Matthieu Jonckheere²⁹ &
…
Ina Maria Verloop^25,27

Part of the book series: Emergence, Complexity and Computation ((ECC,volume 41))

546 Accesses
4 Citations

Abstract

Redundancy mechanisms consist in sending several copies of a same job to a subset of servers. It constitutes one of the most promising ways to exploit diversity in multi-servers applications. However, its pros and cons are still not sufficiently understood in the context of realistic models with generic statistical properties of service-times distributions and correlation structures of copies. We aim at giving a survey of recent results concerning the stability - arguably the first benchmark of performance - of systems with cancel-on-completion redundancy. We also point out open questions and conjectures.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
When \(d=K-1\), there are d servers that process copies of one job, and the remaining \(K-d=1\) server serves one additional job, hence, \(\bar{\ell }=2\). When instead \(d=1\), there is no redundancy and each server serves one job in the saturated system, i.e., \(\bar{\ell }=K\). When \(d=K\), the system behaves as a single server with capacity \(\mu \), that is, \(\bar{\ell }=1\).
2.
X is said to be New-Better-than-Used (NBU) if for all \(t_1,t_2\in \mathbb R\), \( \bar{F}_X (t_1+t_2) \le \bar{F}_X(t_1)\bar{F}_X(t_2).\) X is said to be New-Worse-than-Used (NWU) if for all \(t_1,t_2\in \mathbb R\), \(\bar{F}_X (t_1+t_2) \ge \bar{F}_X(t_1)\bar{F}_X(t_2).\) A sufficient condition for X to be NBU (NWU) is to have an increasing (a decreasing) hazard rate, i.e., r(x) is increasing (decreasing) in x.

References

Akgun, O., Righter, R., Wolff, R.: Partial flexibility in routing and scheduling. Adv. Appl. Probab. 45(3), 673–691 (2013)
Article MathSciNet Google Scholar
Ananthanarayanan, G., Ghodsi, A., Shenker, S., Stoica, I.: Why let resources idle? Aggressive cloning of jobs with dolly. In: Proceedings of the 4th USENIX Conference on Hot Topics in Cloud Computing, HotCloud’ 12, Article 17, p. 6 (2012)
Google Scholar
Ananthanarayanan, G., Ghodsi, A., Shenker, S., Stoica, I.: Effective straggler mitigation: attack of the clones. In: Proceedings of the 10th USENIX Conference on Networked Systems Design and Implementation vol. 13, pp. 185–198 (2013)
Google Scholar
Anton, E., Ayesta, U., Jonckheere, M., Verloop, I.M.: On the stability of redundancy models. Oper. Res. (2021). https://doi.org/10.1287/opre.2020.2030
Article Google Scholar
Anton, E., Ayesta, U., Jonckheere, M., Verloop, I.M.: Improving the performance of heterogeneous data centers through redundancy. In: Proceedings of the ACM on Measurement and Analysis of Computing Systems – SIGMETRICS 4(3), Article 48, p. 29 (2020)
Google Scholar
Ayesta, U., Bodas, T., Dorsman, J., Verloop, I.M.: A token-based central queue with order-independent service rates. Oper. Res., to appear (2021)
Google Scholar
Ayesta, U., Bodas, T., Verloop, I.M.: On a unifying product form framework for redundancy models. Perform. Eval. 127–128, 93–119 (2018)
Article Google Scholar
Bonald, T., Comte, C.: Balanced fair resource sharing in computer clusters. Perform. Eval. 116, 70–83 (2017)
Article Google Scholar
Cardinaels, E., Borst, S.C., van Leeuwaarden, J.S.H.: Redundancy scheduling with locally stable compatibility graphs. arXiv:2005.14566 (2020)
Comte, C., Dorsman, J.: Pass-and-swap queues. arXiv:2009.12299 (2020)
Dean, J.: Achieving rapid response times in large online services. Google Research (2012). http://research.google.com/people/jeff/latency.html
Dean, J., Barroso, L.A.: The tail at scale. Commun. ACM 56, 74–80 (2013)
Article Google Scholar
Duffy, K.R., Shneer, S.: MDS coding is better than replication for job completion times. arXiv:1907.11052 (2019)
Foss, S., Korshunov, D., Zachary, S.: An Introduction to Heavy-Tailed and Subexponential Distributions. Springer, NY (2013)
Book Google Scholar
Gardner, K., Harchol-Balter, M., Hyytia, E., Righter, R.: Scheduling for efficiency and fairness in systems with redundancy. Perform. Eval. 116, 1–25 (2017)
Article Google Scholar
Gardner, K., Harchol-Balter, M., Scheller-Wolf, A., van Houdt, B.: A better model for job redundancy: decoupling server slowdown and job size. IEEE ACM Trans. Netw. 25(6), 3353–3367 (2017)
Article Google Scholar
Gardner, K., Harchol-Balter, M., Scheller-Wolf, A., Velednitsky, M., Zbarsky, S.: Redundancy-d: the power of d choices for redundancy. Oper. Res. 65, 1078–1094 (2017)
Article MathSciNet Google Scholar
Gardner, K., Hyytiä, E., Righter, R.: A little redundancy goes a long way: convexity in redundancy systems. Perform. Eval. 131, 22–42 (2019)
Article Google Scholar
Gardner, K., Righter, R.: Product forms for FCFS queueing models with arbitrary server-job compatibilities: an overview. Queueing Syst. 96(1), 3–51 (2020)
Article MathSciNet Google Scholar
Gardner, K., Zbarsky, S., Doroudi, S., Harchol-Balter, M., Hyytiä, E., Scheller-Wolf, A.: Queueing with redundant requests: exact analysis. Queueing Syst. 83(3–4), 227–259 (2016)
Article MathSciNet Google Scholar
Harchol-Balter, M.: Performance Modeling and Design of Computer Systems: Queueing Theory in Action. Cambridge University Press, NY (2013)
MATH Google Scholar
Hellemans, T., Bodas, T., van Houdt, B.: Performance analysis of workload dependent load balancing policies. In: International Conference on Measurement and Modeling of Computer Systems vol. 3(2), Article 35, p. 35 (2019)
Google Scholar
Hellemans, T., van Houdt, B.: On the Power-of-d-choices with least loaded server selection. In: Proceedings of the ACM on Measurement and Analysis of Computing Systems – SIGMETRICS vol. 2(2), Article 27, p. 22 (2018)
Google Scholar
Hellemans, T., van Houdt, B.: Analysis of redundancy(d) with identical replicas. ACM Sigmetrics Perform. Eval. Rev. 46(3), 74–79 (2018)
Article Google Scholar
Hellemans, T., van Houdt, B.: Performance of redundancy(d) with identical/independent replicas. In: ACM Transaction on Modeling and Performance Evaluation of Computing Systems (TOMPECS), vol. 4(2), Article 9, p. 28 (2019)
Google Scholar
Joshi, G., Soljanin, E., Wornell, G.: Efficient redundancy techniques for latency reduction in cloud systems. In: ACM Transaction on Modeling and Performance Evaluation of Computing Systems (TOMPECS), vol. 2(2), Article 12, p. 30 (2017)
Google Scholar
Koole, G., Righter, R.: Resource allocation in grid computing. J. Sched. 11, 163–173 (2007)
Article Google Scholar
Krzesinski, A.E.: Order independent queues. In: Boucherie, R.J., van Dijk, N.M. (eds.) Queueing Networks: a Fundamental Approach, pp. 85–120. Springer, Boston, MA (2011)
Chapter Google Scholar
Lee, K., Shah, N.B., Huang, L., Ramchandran, K.: When do redundant requests reduce latency? IEEE Trans. Commun. 64(2), 715–722 (2016)
Article Google Scholar
Lee, K., Shah, N.B., Huang, L., Ramchandran, K.: The mds queue: analysing the latency performance of erasure codes. IEEE Trans. Inf. Theory 63(5), 2822–2842 (2017)
MathSciNet MATH Google Scholar
Li, B., Ramamoorthy, A., Srikant, R.: Mean-field-analysis of coding versus replication in cloud storage systems. In: IEEE INFOCOM 2016 - The 35th Annual IEEE International Conference on Computer Communications, pp. 1–9 (2016)
Google Scholar
Mendelson, G.: A lower bound on the stability region of redundancy-d with FIFO service discipline. Oper. Res. Lett. 49(1), 113–120 (2021)
Article MathSciNet Google Scholar
Paganini, F., Tang, A., Ferragut, A., Andrew, L.: Network stability under alpha fair bandwidth allocation with general file size distribution. IEEE Trans. Automat. Contr. 57, 579–591 (2012)
Article MathSciNet Google Scholar
Raaijmakers, Y., Borst, S.C.: Achievable stability in redundancy systems. In: Proceedings of the ACM on Measurement and Analysis of Computing Systems – SIGMETRICS, vol. 4(3), Article 46, p. 21 (2020)
Google Scholar
Raaijmakers, Y., Borst, S.C., Boxma, O.: Redundancy scheduling with scaled Bernoulli service requirements. Queueing Syst. 93, 67–82 (2019)
Article MathSciNet Google Scholar
Raaijmakers, Y., Borst, S.C., Boxma, O.: Stability of redundancy systems with processor sharing. In: Proceedings of the 13th EAI International Conference on Performance Evaluation Methodologies and Tools, Valuetools 20, pp. 120–127 (2020)
Google Scholar
Ross, S.M.: Stochastic Processes. Wiley & Sons, NY (1996)
MATH Google Scholar
Sieber, C., Blenk, A., Hinteregger, M., Kellerer, W.: The cost of aggressive http adaptive streaming: quantifying youtube’s redundant traffic. In: 2015 IFIP/IEEE Intern. Symp. on Integrated Network Management (IM), pp. 1261–1267 (2015)
Google Scholar
Sun, Y., Koksal, C.E., Shroff, N.B.: On delay-optimal scheduling in queueing systems with replications. arXiv:1603.07322 (2016)
Visschers, J., Adan, I., Weiss, G.: A product form solution to a system with multi-type jobs and multi-type servers. Queueing Syst. 70, 269–298 (2012)
Article MathSciNet Google Scholar
Vulimiri, A., Godfrey, P.B., Mittal, R., Sherry, J., Ratnasamy, S., Shenker, S.: Low latency via redundancy. In: Proceedings of the ACM Conference on Emerging Networking Experiments and Technologies, pp. 283–294 (2013)
Google Scholar
Vulimiri, A., Michel, O., Godfrey, P.V., Shenker, S.: More is less: reducing latency via redundancy. In: Proceedings of the 11th ACM Workshop on Hot Topics in Networks, HotNets’11, vol. 11, pp. 13–18 (2012)
Google Scholar
Zubeldia, M.: Delay-optimal policies in partial fork-join systems with redundancy and random slowdowns. In: Proceedings of the ACM on Measurement and Analysis of Computing Systems – SIGMETRICS, vol. 4(1), Article 2, p. 49 (2020)
Google Scholar

Download references

Acknowledgement

Research of E. Anton supported and research of M. Jonckheere partially supported by the French “Agence Nationale de la Recherche (ANR)” through the project ANR-15-CE25-0004 (ANR JCJC RACON). U. Ayesta has received funding from the Department of Education of the Basque Government through the Consolidated Research Group MATHMODE (IT1294-19).

Author information

Authors and Affiliations

CNRS, IRIT, 2 Rue Charles Camichel, 31071, Toulouse, France
Elene Anton, Urtzi Ayesta & Ina Maria Verloop
IKERBASQUE - Basque Foundation for Science, 48011, Bilbao, Spain
Urtzi Ayesta
Université de Toulouse, INP, 31071, Toulouse, France
Elene Anton, Urtzi Ayesta & Ina Maria Verloop
UPV/EHU, University of the Basque Country, 20018, Donostia, Spain
Urtzi Ayesta
Instituto de Cálculo - Conicet, Facultad de Ciencias Exactas y Naturales, Universidad de Buenos Aires (1428) Pabellón II, Buenos Aires, Argentina
Matthieu Jonckheere

Authors

Elene Anton
View author publications
You can also search for this author in PubMed Google Scholar
Urtzi Ayesta
View author publications
You can also search for this author in PubMed Google Scholar
Matthieu Jonckheere
View author publications
You can also search for this author in PubMed Google Scholar
Ina Maria Verloop
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Elene Anton .

Editor information

Editors and Affiliations

Department of Mathematical Sciences, University of Liverpool, Liverpool, UK
Alexey Piunovskiy
Department of Mathematical Sciences, University of Liverpool, Liverpool, UK
Yi Zhang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Anton, E., Ayesta, U., Jonckheere, M., Verloop, I.M. (2021). A Survey of Stability Results for Redundancy Systems. In: Piunovskiy, A., Zhang, Y. (eds) Modern Trends in Controlled Stochastic Processes:. Emergence, Complexity and Computation, vol 41. Springer, Cham. https://doi.org/10.1007/978-3-030-76928-4_13

Download citation

DOI: https://doi.org/10.1007/978-3-030-76928-4_13
Published: 05 June 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-76927-7
Online ISBN: 978-3-030-76928-4
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics

A Survey of Stability Results for Redundancy Systems