Abstract
Redundancy mechanisms consist in sending several copies of a same job to a subset of servers. It constitutes one of the most promising ways to exploit diversity in multi-servers applications. However, its pros and cons are still not sufficiently understood in the context of realistic models with generic statistical properties of service-times distributions and correlation structures of copies. We aim at giving a survey of recent results concerning the stability - arguably the first benchmark of performance - of systems with cancel-on-completion redundancy. We also point out open questions and conjectures.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
When \(d=K-1\), there are d servers that process copies of one job, and the remaining \(K-d=1\) server serves one additional job, hence, \(\bar{\ell }=2\). When instead \(d=1\), there is no redundancy and each server serves one job in the saturated system, i.e., \(\bar{\ell }=K\). When \(d=K\), the system behaves as a single server with capacity \(\mu \), that is, \(\bar{\ell }=1\).
- 2.
X is said to be New-Better-than-Used (NBU) if for all \(t_1,t_2\in \mathbb R\), \( \bar{F}_X (t_1+t_2) \le \bar{F}_X(t_1)\bar{F}_X(t_2).\) X is said to be New-Worse-than-Used (NWU) if for all \(t_1,t_2\in \mathbb R\), \(\bar{F}_X (t_1+t_2) \ge \bar{F}_X(t_1)\bar{F}_X(t_2).\) A sufficient condition for X to be NBU (NWU) is to have an increasing (a decreasing) hazard rate, i.e., r(x) is increasing (decreasing) in x.
References
Akgun, O., Righter, R., Wolff, R.: Partial flexibility in routing and scheduling. Adv. Appl. Probab. 45(3), 673–691 (2013)
Ananthanarayanan, G., Ghodsi, A., Shenker, S., Stoica, I.: Why let resources idle? Aggressive cloning of jobs with dolly. In: Proceedings of the 4th USENIX Conference on Hot Topics in Cloud Computing, HotCloud’ 12, Article 17, p. 6 (2012)
Ananthanarayanan, G., Ghodsi, A., Shenker, S., Stoica, I.: Effective straggler mitigation: attack of the clones. In: Proceedings of the 10th USENIX Conference on Networked Systems Design and Implementation vol. 13, pp. 185–198 (2013)
Anton, E., Ayesta, U., Jonckheere, M., Verloop, I.M.: On the stability of redundancy models. Oper. Res. (2021). https://doi.org/10.1287/opre.2020.2030
Anton, E., Ayesta, U., Jonckheere, M., Verloop, I.M.: Improving the performance of heterogeneous data centers through redundancy. In: Proceedings of the ACM on Measurement and Analysis of Computing Systems – SIGMETRICS 4(3), Article 48, p. 29 (2020)
Ayesta, U., Bodas, T., Dorsman, J., Verloop, I.M.: A token-based central queue with order-independent service rates. Oper. Res., to appear (2021)
Ayesta, U., Bodas, T., Verloop, I.M.: On a unifying product form framework for redundancy models. Perform. Eval. 127–128, 93–119 (2018)
Bonald, T., Comte, C.: Balanced fair resource sharing in computer clusters. Perform. Eval. 116, 70–83 (2017)
Cardinaels, E., Borst, S.C., van Leeuwaarden, J.S.H.: Redundancy scheduling with locally stable compatibility graphs. arXiv:2005.14566 (2020)
Comte, C., Dorsman, J.: Pass-and-swap queues. arXiv:2009.12299 (2020)
Dean, J.: Achieving rapid response times in large online services. Google Research (2012). http://research.google.com/people/jeff/latency.html
Dean, J., Barroso, L.A.: The tail at scale. Commun. ACM 56, 74–80 (2013)
Duffy, K.R., Shneer, S.: MDS coding is better than replication for job completion times. arXiv:1907.11052 (2019)
Foss, S., Korshunov, D., Zachary, S.: An Introduction to Heavy-Tailed and Subexponential Distributions. Springer, NY (2013)
Gardner, K., Harchol-Balter, M., Hyytia, E., Righter, R.: Scheduling for efficiency and fairness in systems with redundancy. Perform. Eval. 116, 1–25 (2017)
Gardner, K., Harchol-Balter, M., Scheller-Wolf, A., van Houdt, B.: A better model for job redundancy: decoupling server slowdown and job size. IEEE ACM Trans. Netw. 25(6), 3353–3367 (2017)
Gardner, K., Harchol-Balter, M., Scheller-Wolf, A., Velednitsky, M., Zbarsky, S.: Redundancy-d: the power of d choices for redundancy. Oper. Res. 65, 1078–1094 (2017)
Gardner, K., Hyytiä, E., Righter, R.: A little redundancy goes a long way: convexity in redundancy systems. Perform. Eval. 131, 22–42 (2019)
Gardner, K., Righter, R.: Product forms for FCFS queueing models with arbitrary server-job compatibilities: an overview. Queueing Syst. 96(1), 3–51 (2020)
Gardner, K., Zbarsky, S., Doroudi, S., Harchol-Balter, M., Hyytiä, E., Scheller-Wolf, A.: Queueing with redundant requests: exact analysis. Queueing Syst. 83(3–4), 227–259 (2016)
Harchol-Balter, M.: Performance Modeling and Design of Computer Systems: Queueing Theory in Action. Cambridge University Press, NY (2013)
Hellemans, T., Bodas, T., van Houdt, B.: Performance analysis of workload dependent load balancing policies. In: International Conference on Measurement and Modeling of Computer Systems vol. 3(2), Article 35, p. 35 (2019)
Hellemans, T., van Houdt, B.: On the Power-of-d-choices with least loaded server selection. In: Proceedings of the ACM on Measurement and Analysis of Computing Systems – SIGMETRICS vol. 2(2), Article 27, p. 22 (2018)
Hellemans, T., van Houdt, B.: Analysis of redundancy(d) with identical replicas. ACM Sigmetrics Perform. Eval. Rev. 46(3), 74–79 (2018)
Hellemans, T., van Houdt, B.: Performance of redundancy(d) with identical/independent replicas. In: ACM Transaction on Modeling and Performance Evaluation of Computing Systems (TOMPECS), vol. 4(2), Article 9, p. 28 (2019)
Joshi, G., Soljanin, E., Wornell, G.: Efficient redundancy techniques for latency reduction in cloud systems. In: ACM Transaction on Modeling and Performance Evaluation of Computing Systems (TOMPECS), vol. 2(2), Article 12, p. 30 (2017)
Koole, G., Righter, R.: Resource allocation in grid computing. J. Sched. 11, 163–173 (2007)
Krzesinski, A.E.: Order independent queues. In: Boucherie, R.J., van Dijk, N.M. (eds.) Queueing Networks: a Fundamental Approach, pp. 85–120. Springer, Boston, MA (2011)
Lee, K., Shah, N.B., Huang, L., Ramchandran, K.: When do redundant requests reduce latency? IEEE Trans. Commun. 64(2), 715–722 (2016)
Lee, K., Shah, N.B., Huang, L., Ramchandran, K.: The mds queue: analysing the latency performance of erasure codes. IEEE Trans. Inf. Theory 63(5), 2822–2842 (2017)
Li, B., Ramamoorthy, A., Srikant, R.: Mean-field-analysis of coding versus replication in cloud storage systems. In: IEEE INFOCOM 2016 - The 35th Annual IEEE International Conference on Computer Communications, pp. 1–9 (2016)
Mendelson, G.: A lower bound on the stability region of redundancy-d with FIFO service discipline. Oper. Res. Lett. 49(1), 113–120 (2021)
Paganini, F., Tang, A., Ferragut, A., Andrew, L.: Network stability under alpha fair bandwidth allocation with general file size distribution. IEEE Trans. Automat. Contr. 57, 579–591 (2012)
Raaijmakers, Y., Borst, S.C.: Achievable stability in redundancy systems. In: Proceedings of the ACM on Measurement and Analysis of Computing Systems – SIGMETRICS, vol. 4(3), Article 46, p. 21 (2020)
Raaijmakers, Y., Borst, S.C., Boxma, O.: Redundancy scheduling with scaled Bernoulli service requirements. Queueing Syst. 93, 67–82 (2019)
Raaijmakers, Y., Borst, S.C., Boxma, O.: Stability of redundancy systems with processor sharing. In: Proceedings of the 13th EAI International Conference on Performance Evaluation Methodologies and Tools, Valuetools 20, pp. 120–127 (2020)
Ross, S.M.: Stochastic Processes. Wiley & Sons, NY (1996)
Sieber, C., Blenk, A., Hinteregger, M., Kellerer, W.: The cost of aggressive http adaptive streaming: quantifying youtube’s redundant traffic. In: 2015 IFIP/IEEE Intern. Symp. on Integrated Network Management (IM), pp. 1261–1267 (2015)
Sun, Y., Koksal, C.E., Shroff, N.B.: On delay-optimal scheduling in queueing systems with replications. arXiv:1603.07322 (2016)
Visschers, J., Adan, I., Weiss, G.: A product form solution to a system with multi-type jobs and multi-type servers. Queueing Syst. 70, 269–298 (2012)
Vulimiri, A., Godfrey, P.B., Mittal, R., Sherry, J., Ratnasamy, S., Shenker, S.: Low latency via redundancy. In: Proceedings of the ACM Conference on Emerging Networking Experiments and Technologies, pp. 283–294 (2013)
Vulimiri, A., Michel, O., Godfrey, P.V., Shenker, S.: More is less: reducing latency via redundancy. In: Proceedings of the 11th ACM Workshop on Hot Topics in Networks, HotNets’11, vol. 11, pp. 13–18 (2012)
Zubeldia, M.: Delay-optimal policies in partial fork-join systems with redundancy and random slowdowns. In: Proceedings of the ACM on Measurement and Analysis of Computing Systems – SIGMETRICS, vol. 4(1), Article 2, p. 49 (2020)
Acknowledgement
Research of E. Anton supported and research of M. Jonckheere partially supported by the French “Agence Nationale de la Recherche (ANR)” through the project ANR-15-CE25-0004 (ANR JCJC RACON). U. Ayesta has received funding from the Department of Education of the Basque Government through the Consolidated Research Group MATHMODE (IT1294-19).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Anton, E., Ayesta, U., Jonckheere, M., Verloop, I.M. (2021). A Survey of Stability Results for Redundancy Systems. In: Piunovskiy, A., Zhang, Y. (eds) Modern Trends in Controlled Stochastic Processes:. Emergence, Complexity and Computation, vol 41. Springer, Cham. https://doi.org/10.1007/978-3-030-76928-4_13
Download citation
DOI: https://doi.org/10.1007/978-3-030-76928-4_13
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-76927-7
Online ISBN: 978-3-030-76928-4
eBook Packages: EngineeringEngineering (R0)