Skip to main content
Log in

Replicate to the shortest queues

  • Published:
Queueing Systems Aims and scope Submit manuscript

Abstract

This paper introduces a load-balancing policy that interpolates between two well-known policies, namely join the shortest queue (JSQ) and join the least workload (JLW), and studies it in heavy traffic. This policy, which we call replicate to the shortest queues (RSQ(d)), routes jobs from a stream of arrivals into buffers attached to N servers by replicating each arrival into \(1\le d\le N\) tasks and sending the replicas to the d shortest queues. When the first of the tasks reaches a server, its \(d-1\) replicas are canceled. Clearly, RSQ(1) is equivalent to JSQ, and it has been shown that RSQ(N) is equivalent to JLW; intermediate values of d provide a trade-off between good performance measures of JSQ and those of JLW. In heavy traffic, a key property underlying asymptotic analysis of load-balancing policies is state space collapse (SSC). Unlike policies such as JSQ, where SSC is well understood, the treatment of SSC under RSQ(d) requires addressing the massive cancellations that highly complicate the queue length dynamics. Our first main result is that SSC holds under RSQ(d) for possibly heterogeneous servers. Based on this result, we obtain diffusion limits for the queue lengths in the form of one-dimensional reflected Brownian motion, asymptotic characterization of the short-time-averaged delay process and a version of Reiman’s snapshot principle. We illustrate using simulations that as d increases the server workloads become more balanced, and the delay distribution’s tail becomes lighter. We also discuss the implementation complexity of the policy as compared to that of the redundancy routing policy, to which it is closely related.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

References

  1. Ananthanarayanan, G., Ghodsi, A., Shenker, S., Stoica, I.: Effective straggler mitigation: attack of the clones. USENIX NSDI 13, 185–198 (2013)

    Google Scholar 

  2. Ananthanarayanan, G., Kandula, S., Greenberg, A.G., Stoica, I., Lu, Y., Saha, B., Harris, E.: Reining in the outliers in map reduce clusters using Mantri. OSDI 10(1), 24 (2010)

    Google Scholar 

  3. Atar, R., Keslassy, I., Mendelson G.: Sub-diffusive load-balancing in time-varying queueing systems. Oper. Res. Preprint (accepted)

  4. Atar, R., Saha, S.: An \(\epsilon \)-Nash equilibrium with high probability for strategic customers in heavy traffic. Math. Oper. Res. 42(3), 626–647 (2016)

    Article  Google Scholar 

  5. Billingsley, P.: Convergence of Probability Measures. John Wiley and sons (2013)

  6. Bramson, M.: State space collapse with application to heavy traffic limits for multiclass queueing networks. Queueing Syst. 30(1–2), 89–140 (1998)

    Article  Google Scholar 

  7. Chen, H., Ye, H.Q.: Asymptotic optimality of balanced routing. Oper. Res. 60(1), 163–179 (2012)

    Article  Google Scholar 

  8. Daley, D.J.: Certain optimality properties of the first-come first-served discipline for G/G/s queues. Stoch. Process. Appl. 25, 301–308 (1987)

    Article  Google Scholar 

  9. Foley, R.D., McDonald, D.R.: Join the shortest queue: stability and exact asymptotics. Ann. Appl. Probab. 11(3), 569–607 (2001)

    Google Scholar 

  10. Foss, S.G.: Extremal problems in queueing theory. Doctoral dissertation, Ph.D. thesis, Novosibirsk State University, In Russian (1982)

  11. Foss, S., Chernova, N.: On the stability of a partially accessible multi-station queue with state-dependent routing. Queueing Syst. 29(1), 55–73 (1998)

    Article  Google Scholar 

  12. Gardner, K., Zbarsky, S., Doroudi, S., Harchol-Balter, M., Hyytia, E.: Reducing latency via redundant requests: exact analysis. ACM SIGMETRICS Perform. Eval. Rev. 43(1), 347–360 (2015)

    Article  Google Scholar 

  13. Gupta, V., Balter, M.H., Sigman, K., Whitt, W.: Analysis of join-the-shortest-queue routing for web server farms. Perform. Eval. 64(9–12), 1062–1081 (2007)

    Article  Google Scholar 

  14. Huang, J., Zhang, H.: Diffusion approximations for open Jackson networks with reneging. Queueing Syst. 74(4), 445–476 (2013)

    Article  Google Scholar 

  15. Kruk, Ł., Lehoczky, J., Ramanan, K., Shreve, S.: Heavy traffic analysis for EDF queues with reneging. Ann. Appl. Probab. 21(2), 484–545 (2011)

    Article  Google Scholar 

  16. Koole, G., Righter, R.: Resource allocation in grid computing. J. Sched. 11(3), 163–173 (2008)

    Article  Google Scholar 

  17. Lu, Y., Xie, Q., Kliot, G., Geller, A., Larus, J.R., Greenberg, A.: Join-idle-queue: a novel load balancing algorithm for dynamically scalable web services. Perform. Eval. 68(11), 1056–1071 (2011)

    Article  Google Scholar 

  18. Mitzenmacher, M.: The power of two choices in randomized load balancing. IEEE Trans. Parallel Distrib. Syst. 12(10), 1094–1104 (2001)

    Article  Google Scholar 

  19. Ousterhout, K., Wendell, P., Zaharia, M., Stoica, I.: Sparrow: distributed, low latency scheduling. In: ACM SOSP, pp. 69-84. (2013)

  20. Reiman, M. I.: The heavy traffic diffusion approximation for sojourn times in Jackson networks. In: Applied Probability Computer Science: The Interface, pp. 409-421. Birkhäuser Boston (1982)

  21. Reiman, M. I.: Some diffusion approximations with state space collapse. In: Modelling and Performance Evaluation Methodology, pp. 207-240. Springer, Berlin (1984)

  22. Shah, N.B., Lee, K., Ramchandran, K.: When do redundant requests reduce latency? IEEE Trans. Commun. 64(2), 715–722 (2016)

    Article  Google Scholar 

  23. Whitt, W.: Deciding which queue to join: some counterexamples. Oper. Res. 34(1), 55–62 (1986)

    Article  Google Scholar 

  24. Williams, R.J.: Diffusion approximations for open multiclass queueing networks: sufficient conditions involving state space collapse. Queueing Syst. 30(1), 27–88 (1998)

    Article  Google Scholar 

  25. Wolff, R.W.: Upper bounds on work in system for multichannel queues. J. Appl. Probab. 24(2), 547–551 (1987)

    Article  Google Scholar 

Download references

Acknowledgements

The authors are grateful to an AE and two referees for careful reading and valuable comments.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Gal Mendelson.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rami Atar and Gal Mendelson: Research supported in part by the ISF (Grant 1184/16).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Atar, R., Keslassy, I. & Mendelson, G. Replicate to the shortest queues. Queueing Syst 92, 1–23 (2019). https://doi.org/10.1007/s11134-019-09605-2

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11134-019-09605-2

Keywords

Mathematics Subject Classification

Navigation