Abstract
Intelligent dispatching is crucial to obtaining low response times in large-scale systems. One common scalable dispatching paradigm is the “power-of-d,” in which the dispatcher queries d servers at random and assigns the job to a server based only on the state of the queried servers. The bulk of power-of-d policies studied in the literature assume that the system is homogeneous, meaning that all servers have the same speed; meanwhile, real-world systems often exhibit server speed heterogeneity. This paper introduces a general framework for describing and analyzing heterogeneity-aware power-of-d policies. The key idea behind our framework is that dispatching policies can make use of server speed information at two decision points: when choosing which d servers to query and when assigning a job to one of those servers. Our framework explicitly separates the dispatching policy into a querying rule and an assignment rule; we consider general families of both rule types. While the strongest assignment rules incorporate both detailed queue-length information and server speed information, these rules typically are difficult to analyze. We overcome this difficulty by focusing on heterogeneity-aware assignment rules that ignore queue length information beyond idleness status. In this setting, we analyze mean response time and formulate novel optimization problems for the joint optimization of querying and assignment. We build upon our optimized policies to develop heuristic queue length-aware dispatching policies. Our heuristic policies perform well in simulation, relative to policies that have appeared in the literature.
Similar content being viewed by others
Notes
Throughout, names and abbreviations of individual rules and policies are rendered in sans-serif font; see “Appendix A” for a list of individual rules and policies proposed, studied, and/or referenced in this paper.
Throughout, names and abbreviations of parameterized families of rules and policies are rendered in bold serif font; see “Appendix A” for a list of families of rules and policies proposed, studied, and/or referenced in this paper.
References
Banawan, S., Zeidat, N.: A comparative study of load sharing in heterogeneous multicomputer systems. In: Proceedings of 25th Annual Simulation Symposium, pp. 22–31. IEEE (1992)
Banawan, S.A., Zahorjan, J.: Load sharing in heterogeneous queueing systems. In: Proceedings of IEEE INFOCOM’89, pp. 731–739 (1989)
Bonomi, F.: On job assignment for a parallel system of processor sharing queues. IEEE Trans. Comput. 39(7), 858–869 (1990)
Chen, H., Ye, H.Q.: Asymptotic optimality of balanced routing. Oper. Res. 60(1), 163–179 (2012)
Dunning, I., Huchette, J., Lubin, M.: Jump: a modeling language for mathematical optimization. SIAM Rev. 59(2), 295–320 (2017)
Feng, H., Misra, V., Rubenstein, D.: Optimal state-free, size-aware dispatching for heterogeneous m/g/-type systems. Perform. Eval 62(1), 475–492 (2005). https://doi.org/10.1016/j.peva.2005.07.031
Gardner, K., Jaleel, J.A., Wickeham, A., Doroudi, S.: Scalable load balancing in the presence of heterogeneous servers. Performance Evaluation p. 102151 (2020)
Gupta, V., Harchol-Balter, M., Sigman, K., Whitt, W.: Analysis of join-the-shortest-queue routing for web server farms. Perform. Eval. 64(9–12), 1062–1081 (2007)
Hellemans, T., Bodas, T., Van Houdt, B.: Performance analysis of workload dependent load balancing policies. In: Proceedings of the ACM on Measurement and Analysis of Computing Systems (2019). https://doi.org/10.1145/3341617.3326150
Hyytiä, E.: Optimal routing of fixed size jobs to two parallel servers. INFOR: Inf. Syst. Oper. Res. 51(4), 215–224 (2013). https://doi.org/10.3138/infor.51.4.215
Izagirre, A., Makowski, A.: Light traffic performance under the power of two load balancing strategy: the case of server heterogeneity. SIGMETRICS Perform. Eval. Rev. 42(2), 18–20 (2014)
Jaleel, J.A., Doroudi, S., Gardner, K., Wickeham, A.: A general “power-of-d” dispatching framework for heterogeneous systems (2021). https://arxiv.org/abs/2112.05823
Koole, G.: A simple proof of the optimality of a threshold policy in a two-server queueing system. Syst. Control Lett. 26(5), 301–303 (1995)
Larsen, R.L.: Control of Multiple Exponential Servers with Application to Computer Systems. Ph.D. thesis, College Park, MD, USA (1981)
Lin, W., Kumar, P.R.: Optimal control of a queueing system with two heterogeneous servers. IEEE Trans. Autom. Control 29(8), 696–703 (1984)
Lu, Y., Xie, Q., Kliot, G., Geller, A., Larus, J., Greenberg, A.: Join-idle-queue: a novel load balancing algorithm for dynamically scalable web services. Perform. Eval. 68(11), 1056–1071 (2011)
Lubin, M., Dunning, I.: Computing in operations research using Julia. INFORMS J. Comput. 27(2), 238–248 (2015). https://doi.org/10.1287/ijoc.2014.0623
Luh, H.P., Viniotis, I.: Threshold control policies for heterogeneous server systems. Math. Methods Oper. Res. 55(1), 121–142 (2002)
Mitzenmacher, M.: The power of two choices in randomized load balancing. IEEE Trans. Parallel Distrib. Syst. 12(10), 1094–1104 (2001)
Mukhopadhyay, A., Mazumdar, R.: Analysis of randomized join-the-shortest-queue (JSQ) schemes in large heterogeneous processor-sharing systems. IEEE Trans. Control Netw. Syst. 3(2), 116–126 (2016)
Nelson, R.D., Philips, T.K.: An Approximation to the Response Time for Shortest Queue Routing, vol. 17. ACM, New York (1989)
Rubinovitch, M.: The slow server problem. J. Appl. Probab. 22(1), 205–213 (1985)
Rubinovitch, M.: The slow server problem: a queue with stalling. J. Appl. Probab. 22(4), 879–892 (1985)
Rykov, V.V., Efrosinin, D.V.: On the slow server problem. Autom. Remote. Control. 70(12), 2013–2023 (2009)
Selen, J., Adan, I., Kapodistria, S.: Approximate performance analysis of generalized join the shortest queue routing. In: Proceedings of the 9th EAI International Conference on Performance Evaluation Methodologies and Tools, pp. 103–110. ICST (Institute for Computer Sciences, Social-Informatics and ... (2016)
Selen, J., Adan, I., Kapodistria, S., van Leeuwaarden, J.: Steady-state analysis of shortest expected delay routing. Queueing Syst. 84(3–4), 309–354 (2016)
Sethuraman, J., Squillante, M.S.: Optimal stochastic scheduling in multiclass parallel queues. SIGMETRICS Perform. Eval. Rev. 27(1), 93–102 (1999). https://doi.org/10.1145/301464.301483
Stolyar, A.: Pull-based load distribution in large-scale heterogeneous service systems. Queueing Syst. 80(4), 341–361 (2015)
Stolyar, A.L.: Pull-based load distribution among heterogeneous parallel servers: the case of multiple routers. Queueing Syst. 85(1–2), 31–65 (2017)
Tantawi, A.N., Towsley, D.: Optimal static load balancing in distributed computer systems. J. ACM (JACM) 32(2), 445–465 (1985)
Vargaftik, S., Keslassy, I., Orda, A.: LSQ: load balancing in large-scale heterogeneous systems with multiple dispatchers. IEEE/ACM Transactions on Networking, 28(3), 1186–1198 (2020). https://urldefense.com/v3/. https://doi.org/10.1109/TNET.2020.2980061
Vvedenskaya, N., Dobrushin, R., Karpelevich, F.: Queueing system with selection of the shortest of two queues: an asymptotic approach. Problemy Peredachi Informatsii 32(1), 20–34 (1996)
Wächter, A., Biegler, L.T.: On the implementation of an interior-point filter line-search algorithm for large-scale nonlinear programming. Math. Program. 106(1), 25–57 (2006). https://doi.org/10.1007/s10107-004-0559-y
Wang, C., Feng, C., Cheng, J.: Distributed join-the-idle-queue for low latency cloud services. IEEE/ACM Trans. Netw. 26(5), 2309–2319 (2018)
Weber, R.R.: On the optimal assignment of customers to parallel servers. J. Appl. Probab. 15(2), 406–413 (1978)
Weng, W., Zhou, X., Srikant, R.: Optimal load balancing with locality constraints. Proc. ACM Meas. Anal. Comput. Syst. 4(3), 1–37 (2020)
Whitt, W.: Deciding which queue to join: some counterexamples. Oper. Res. 34(1), 55–62 (1986)
Winston, W.: Optimality of the shortest line discipline. J. Appl. Probab. 14(1), 181–189 (1977)
Zhou, X., Shroff, N., Wierman, A.: Asymptotically optimal load balancing in large-scale heterogeneous systems with multiple dispatchers. Perform. Eval. 145, 102146 (2021)
Zhou, X., Wu, F., Tan, J., Sun, Y., Shroff, N.: Designing low-complexity heavy-traffic delay-optimal load balancing schemes: theory to algorithms. Proc. ACM Measu. Anal. Comput. Syst. 1(2), 39 (2017)
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Abdul Jaleel, J., Doroudi, S., Gardner, K. et al. A general “power-of-d” dispatching framework for heterogeneous systems. Queueing Syst 102, 431–480 (2022). https://doi.org/10.1007/s11134-022-09736-z
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11134-022-09736-z