Open problems in queueing theory inspired by datacenter computing

Harchol-Balter, Mor

doi:10.1007/s11134-020-09684-6

Open problems in queueing theory inspired by datacenter computing

Published: 27 January 2021

Volume 97, pages 3–37, (2021)
Cite this article

Queueing Systems Aims and scope Submit manuscript

Mor Harchol-Balter ORCID: orcid.org/0000-0003-1721-6759¹

2505 Accesses
27 Citations
1 Altmetric
Explore all metrics

Abstract

Datacenter operations today provide a plethora of new queueing and scheduling problems. The notion of a “job” has become more general and multi-dimensional. The ways in which jobs and servers can interact have grown in complexity, involving parallelism, speedup functions, precedence constraints, and task graphs. The workloads are vastly more variable and more heavy-tailed. Even the performance metrics of interest are broader than in the past, with multi-dimensional service-level objectives in terms of tail probabilities. The purpose of this article is to expose queueing theorists to new models, while providing suggestions for many specific open problems of interest, as well as some insights into their potential solution.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Queueing Model that Works Only on the Biggest Jobs

Queueing Networks

A broad view of queueing theory through one issue

Article 12 April 2018

Notes

In the multiserver job model, we assume FCFS scheduling, which is what is used in datacenters. This is not to be confused with the virtual machine (VM) packing problem, where the literature has focused on packing jobs into VMs based on the number of resources that they request, so as to achieve throughput optimality (see [77, 85, 109, 110, 118]). However, even in the VM packing problem, waste can occur.
In the above example, we are thinking of the job as being run alone on the k servers. If two jobs are time-sharing the same k servers, then the service time of each will double.
If \(k <1\), it is common to assume that \(s(k) = k\), which is consistent with the intuition that if a job is allocated half a server, then it runs at half speed.
Note that SRPT and FCFS are equivalent in the case where all jobs have the same size.
The optimal allocation is derived both for the case where the goal is to minimize mean response time and the case where the goal is to minimize mean slowdown. The slowdown metric is discussed in Sect. 7.1.3.
Gittins becomes SRPT when job sizes are known.
A job’s “rank” is its priority, where lower rank is better, and where ties are broken in FCFS order. Rank is a function of age, but can also depend on a job’s size or class [129].

References

Amazon EC2. http://aws.amazon.com/ec2/. Accessed 15 Nov 2020
Azure Public Dataset (2019). https://github.com/Azure/AzurePublicDataset. Accessed 15 Nov 2020
Google Compute Engine. http://cloud.google.com/products/compute-engine.html. Accessed 15 Nov 2020
Windows Azure. http://www.windowsazure.com/. Accessed 15 Nov 2020
Datacenter Spending (2020). https://www.cbronline.com/news/data-centre-spending. Accessed 15 Nov 2020
Flexera.: State of the Cloud Report (2020). https://www.flexera.com/blog/industry-trends/trend-of-cloud-computing-2020/. Accessed 15 Nov 2020
Aalto, S., Ayesta, U., Righter, R.: On the Gittins index in the M/G/1 queue. Queueing Syst. 63(1), 437–458 (2009)
Article Google Scholar
Aalto, S., Ayesta, U., Righter, R.: Properties of the Gittins index with application to optimal scheduling. Probab. Eng. Inf. Sci. 25(3), 269–288 (2011)
Article Google Scholar
Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., Dean, J., Devin, M., Ghemawat, S., Irving, G., Isard, M., Kudlur, M., Levenberg, J., Monga, R., Moore, S., Murray, D., Steiner, B., Tucker, P., Vasudevan, V., Warden, P., Wicke, M., Yu, Y., Zheng, X..: Tensorflow: a system for large-scale machine learning. In: Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI ’16), pp. 265–283 (2016)
Abate, J., Choudhury, G.L., Whitt, W.: Asymptotics for steady-state tail probabilities in structured Markov queueing models. Stoch. Mod. 10(1), 99–143 (1994)
Google Scholar
Abate, J., Choudhury, G.L., Whitt, W.: Waiting-time tail probabilities in queues with long-tail service-time distributions. Queueing Syst. 16, 311–338 (1994)
Article Google Scholar
Abate, J., Choudhury, G.L., Whitt, W.: An introduction to numerical transform inversion and its application to probability models. In: Grassmann, W.K. (ed.) Computational Probability, pp. 257–323. Springer, Boston (2000)
Chapter Google Scholar
Abate, J., Whitt, W.: A unified framework for numerically inverting Laplace transforms. INFORMS J. Comput. 18(4), 408–421 (2006)
Article Google Scholar
Acar, U., Blelloch, G.E., Blumofe, R.: The data locality of work stealing. Theory Comput. Syst. 35(3), 321–347 (2002)
Article Google Scholar
Afanaseva, L., Bashtova, E., Grishunina, S.: Stability analysis of a multi-server model with simultaneous service and a regenerative input flow. Methodol. Comput. Appl. Probab. 22, 1439–1455 (2020)
Article Google Scholar
Afanaseva, L., Grishunina, S.: Stability conditions for a multiserver queueing system with a regenerative input flow and simultaneous service of a customer by a random number of servers. Queueing Syst. 94, 213–241 (2020)
Article Google Scholar
Agrawal, K., Li, J., Lu, K., Moseley, B.: Scheduling parallel DAG jobs online to minimize average flow time. In: Proceedings of the 27th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA ’16), pp. 176–189 (2016)
Agrawal, K., Li, J., Lu, K., Moseley, B.: Scheduling parallelizable jobs online to minimize the maximum flow time. In: Symposium on Parallel Algorithms and Architectures (SPAA’16), pp. 195–205 (2016)
Agrawal, K., Li, J., Lu, K., Moseley, B.: Scheduling parallelizable jobs online to maximize throughput. In: LATIN 2018: Theoretical Informatics—13th Latin American Symposium, Buenos Aires, Argentina, pp. 755–776 (2018)
Ahmad, N., Greenberg, A.G., Lahiri, P., Maltz, D., Patel, P.K., Sengupta, S., Vaid, K.V.: Distributed load balancer. Google Patents. U.S. Patent App. 12/189,438 (2008)
Anton, E., Ayesta, U., Jonckheere, M., Verloop, I.M..: On the stability of redundancy models (2019). arXiv:1903.04414
Anton, E., Ayesta, U., Jonckheere, M., Verloop, I.M..: Improving the performance of heterogeneous data centers through redundancy (2020). arXiv:2003.01394
Arora, N.S., Blumofe, R.D., Plaxton, C.G.: Thread scheduling for multiprogrammed multiprocessors. In: 10th Annual ACM Symposium on Parallel Algorithms and Architectures, pp. 119–129 (1998)
Arthurs, E., Kaufman, J.: Sizing a message store subject to blocking criteria. In: IFIP Performance Conference, pp. 547–564 (1979)
AWS. Netflix & AWS Lambda Case Study. https://aws.amazon.com/solutions/case-studies/netflix-and-aws-lambda/. Accessed 15 Nov 2020
AWS. Step Functions. https://docs.aws.amazon.com/step-functions/latest/dg/welcome.html. Accessed 15 Nov 2020
Baccelli, F., Foss, S.: Poisson hail on a hot ground. J. Appl. Probab. 48(A), 343–366 (2011)
Article Google Scholar
Baccelli, F., Makowski, A.M.: Simple computable bounds for the fork–join queue. Technical Report RR-0394, INRIA (1985)
Baccelli, F., Makowski, A.M., Shwartz, A.: The fork–join queue and related systems with synchronization constraints: stochastic ordering and computable bounds. Adv. Appl. Probab. 21, 629–660 (1989)
Article Google Scholar
Barroso, L.A., Holzle, U.: The case for energy-proportional computing. Computer 40(12), 33–37 (2007)
Article Google Scholar
Bean, N.G., Gibbens, R.J., Zachary, S.: Asymptotic analysis of single resource loss systems in heavy traffic, with applications to integrated networks. Adv. Appl. Probab. 27(1), 273–292 (1995)
Article Google Scholar
Bekker, R., Borst, S., Núñez-Queija, R.: Performance of TCP-friendly streaming sessions in the presence of heavy-tailed elastic flows. Perform. Eval. 61(2), 143–162 (2005)
Article Google Scholar
Benameur, N., Fredj, S. Ben, Delcoigne, F., Oueslati-Boulahia, S., Roberts, J.W.: Integrated admission control for streaming and elastic traffic. In: International Workshop on Quality of Future Internet Services, pp. 69–81 (2001)
Berg, B., Dorsman, J.-P., Harchol-Balter, M.: Towards optimality in parallel job scheduling. Proc. ACM Meas. Anal. Comput. Syst. (POMACS/SIGMETRICS) 1(2), 1–30 (2017). Article 40
Article Google Scholar
Berg, B., Harchol-Balter, M., Moseley, B., Wang, W., Whitehouse, J.: Optimal resource allocation for elastic and inelastic jobs. In: Proceedings of the 32nd ACM Symposium on Parallelism in Algorithms and Architectures (SPAA’20), pp. 75–87, Philadelphia, PA (2020)
Berg, B., Vesilo, R., Harchol-Balter, M.: heSRPT: Parallel scheduling to minimize mean slowdown. In: 38th International Symposium on Computer Performance, Modeling, Measurement, and Evaluation (IFIP PERFORMANCE 2020), Milan, Italy (2020)
Berger, D., Berg, B., Zhu, T., Sen, S., Harchol-Balter, M.: Robinhood: Tail latency aware caching—dynamic reallocation from cache-rich to cache-poor. In: 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 2018), pp. 195–212, Carlsbad, CA (2018)
Bienia, C., Kumar, S., Singh, J. P., Li, K.: The PARSEC benchmark suite: characterization and architectural implications. In: Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques (PACT’08), pp. 72–81, New York, NY (2008)
Blelloch, G., Gibbons, P., Matias, Y.: Provably efficient scheduling for languages with fine-grained parallelism. J. ACM 46(2), 281–321 (1999)
Article Google Scholar
Blelloch, G.E., Fineman, J.T., Gibbons, P.B., Simhadri, H.V.: Scheduling irregular parallel computations on hierarchical caches. In: Proceedings of the 23rd Annual ACM Symposium on Parallelism in Algorithms and Architectures (SPAA’11), pp. 355–366, San Jose, California (2011)
Blumofe, R.D., Joerg, C.F., Kuszmaul, B.C., Leiserson, C.E., Randall, K.H., Zhou, Y.: Cilk: an efficient multithreaded runtime system. J. Parallel Distrib. Comput. 37(1), 55–69 (1996)
Article Google Scholar
Blumofe, R.D., Leiserson, C.E.: Scheduling multithreaded computations by work stealing. In: IEEE Symposium on Foundations of Computer Science, pp. 356–368 (1994)
Blumofe, R.D., Leiserson, C.E.: Scheduling multithreaded computations by work stealing. J. ACM 46(5), 720–748 (1999)
Article Google Scholar
Blumofe, R.D., Papadopoulos, D.: Hood: a user-level threads library for multiprogrammed multiprocessors. Technical Report, University of Texas at Austin (1999)
Bonald, T., Proutière, A.: On performance bounds for the integration of elastic and adaptive streaming flows. In: Joint International ACM SIGMETRICS/Performance Conference on Measurement and Modeling of Computer Systems, pp. 235–245 (2004)
Borst, S., Núñez-Queija, R., Zwart, B.: Sojourn time asymptotics in processor-sharing queues. Queueing Syst. 53(1–2), 31–51 (2006)
Article Google Scholar
Borst, S.C., Boxma, O.J., Núñez-Queija, R., Zwart, B.: The impact of the service discipline on delay asymptotics. Perform. Eval. 54(2), 175–206 (2003)
Article Google Scholar
Boxma, O.J., Deng, Q., Zwart, B.: Waiting-time asymptotics for the M/G/2 queue with heterogeneous servers. Queueing Syst. 40(1), 5–31 (2002)
Article Google Scholar
Boxma, O.J., Zwart, B.: Tails in scheduling. SIGMETRICS Perform. Eval. Rev. 34(4), 13–20 (2007)
Article Google Scholar
Brill, P.H., Green, L.: Queues in which customers receive simultaneous service from a random number of servers: a system point approach. Manag. Sci. 30(1), 51–68 (1984)
Article Google Scholar
Cera, M.C., Georgiou, Y., Richard, O., Maillard, N., Navaux, P.O.A.: Supporting malleability in parallel architectures with dynamic CPUSETsMapping and dynamic MPI. In: Kant, K., Pemmaraju, S.V., Sivalingam, K.M., Wu, J. (eds.) International Conference on Distributed Computing and Networking (ICDCN’20), pp. 242–257 (2010)
Chowdhury, R.A., Ramachandran, V., Silvestri, F., Blakeley, B.: Oblivious algorithms for multicores and networks of processors. J. Parallel Distrib. Comput. 73(7), 911–925 (2018)
Article Google Scholar
Crovella, M., Harchol-Balter, M., Murta, C.: Task assignment in a distributed system: Improving performance by unbalancing load. In: Proceedings of the ACM SIGMETRICS Joint International Conference on Measurement and Modeling of Computer Systems, pp. 268–269. Poster Session (1998)
Dasylva, A., Srikant, R.: Bounds on the performance of admission control and routing policies for general topology networks with multiple call centers. In: Eighteenth Annual IEEE INFOCOM’99 International Conference on Computer Communications, pp. 505–512 (1999)
Dean, J., Barroso, L.A.: The tail at scale. Commun. ACM 56(2), 74–80 (2013)
Article Google Scholar
Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)
Article Google Scholar
Delimitrou, C., Kozyrakis, C.: Quasar: resource-efficient and QoS-aware cluster management. In: ASPLOS’14, pp. 127–144, Salt Lake City, Utah (2014)
den Iseger, P.: Numerical transform inversion using Gaussian quadrature. Probab. Eng. Inf. Sci. 20, 1–44 (2006)
Article Google Scholar
den Iseger, P., Gruntjes, P., Mandjes, M.: A Wiener–Hopf based approach to numerical computations in fluctuation theory for Lévy processes. Math. Methods Oper. Res. 78(1), 101–118 (2013)
Article Google Scholar
Dubner, H., Abate, J.: Numerical inversion of Laplace transforms by relating them to the finite Fourier cosine transform. J. ACM 15(1), 115–123 (1968)
Article Google Scholar
Fan, Z., Sen, R., Koutris, P., Albarghouthi, A.: Automated tuning of query degree of parallelism via machine learning. In: Proceedings of the 3rd International Workshop on Exploiting Artificial Intelligence Techniques for Data Management (2020)
Filippopoulos, D., Karatza, H.: An M/M/2 parallel system model with pure space sharing among rigid jobs. Math. Comput. Model. 45(5), 491–530 (2007)
Article Google Scholar
Foss, S., Konstantopoulos, T., Mountford, T.: Power law condition for stability of Poisson hail. J. Theor. Probab. 31, 684–704 (2018)
Article Google Scholar
Foss, S., Korshunov, D.: Heavy tails in multi-server queue. Queueing Syst. Theory Pract. 52, 31–48 (2006)
Article Google Scholar
Foss, S., Korshunov, D., Zachary, S.: An Introduction to Heavy-Tailed and Subexponential Distributions, 2nd edn. Springer, New York (2013)
Book Google Scholar
Fouladi, S., Wahby, R.S., Shacklett, B., Balasubramaniam, K.V., Zeng, W., Bhalerao, R., Sivaraman, A., Porter, G., Winstein, K.: Encoding, fast and slow: low-latency video processing using thousands of tiny threads. In: 14th USENIX Symposium on Networked Systems Design and Implementation (NSDI 17), pp. 363–376, Boston, MA (2017)
Frigo, M., Leiserson, C.E., Randall, K.H.: The implementation of the Cilk-5 multithreaded language. In: ACM PLDI, pp. 212–223 (1998)
Gandhi, A., Doroudi, S., Harchol-Balter, M., Scheller-Wolf, A.: Exact analysis of the M/M/k/setup class of Markov chains via Recursive Renewal Reward. Queueing Syst. Theory Appl. 77(2), 177–209 (2014)
Article Google Scholar
Gandhi, A., Gupta, V., Harchol-Balter, M., Kozuch, M.: Optimality analysis of energy-peformance trade-off for server farm management. Perform. Eval. 67(11), 1155–1171 (2010)
Article Google Scholar
Gandhi, A., Harchol-Balter, M., Adan, I.: Server farms with setup costs. Perform. Eval. 67(11), 1123–1138 (2010)
Article Google Scholar
Gandhi, A., Harchol-Balter, M., Raghunathan, R., Kozuch, M.: AutoScale: dynamic, robust capacity management for multi-tier data centers. ACM Trans. Comput. Syst. 30(4), 1–26 (2012)
Article Google Scholar
Gardner, K., Harchol-Balter, M., Scheller-Wolf, A., Van Houdt, B.: A better model for job redundancy: decoupling server slowdown and job size. ACM/IEEE Trans. Netw. 25(6), 3353–3367 (2017)
Article Google Scholar
Gardner, K., Harchol-Balter, M., Scheller-Wolf, A., Velednitsky, M., Zbarsky, S.: Redundancy-d: the power of d choices for redundancy. Oper. Res. 65(4), 1078–1094 (2017)
Article Google Scholar
Gardner, K., Zbarsky, S., Doroudi, S., Harchol-Balter, M., Hyytia, E., Scheller-Wolf, A.: Queueing with redundant requests: exact analysis. Queueing Syst. Theory Appl. 83(3), 227–259 (2016)
Article Google Scholar
Gardner, K., Zbarsky, S., Doroudi, S., Harchol-Balter, M., Hyytiä, E., Scheller-Wolf, A.: Reducing latency via redundant requests: exact analysis. In: ACM Sigmetrics 2015 Conference on Measurement and Modeling of Computer Systems, pp. 347–360 (2015)
Gavish, B., Schweitzer, P.J.: The Markovian queue with bounded waiting time. Manag. Sci. 23(12), 1349–1357 (1977)
Article Google Scholar
Ghaderi, J.: Randomized algorithms for scheduling VMs in the cloud. In: 35th Annual IEEE International Conference on Computer Communications, INFOCOM 2016, San Francisco, CA, USA, April 10–14, 2016, pp. 1–9 (2016)
Gittins, J.C., Glazebrook, K.D., Weber, R.: Multi-armed Bandit Allocation Indices. Wiley, New York (2011)
Book Google Scholar
Glynn, P.W., Whitt, W.: Logarithmic asymptotics for steady-state tail probabilities in a single-server queue. J. Appl. Probab. 31(A), 131–156 (1994)
Article Google Scholar
Goldstein, S.C., Schauser, K.E., Culler, D.E.: Lazy threads: implementing a fast parallel call. J. Parallel Distrib. Comput. 37(1), 5–20 (1996)
Article Google Scholar
Graham, R.L., Lawler, E.L., Lenstra, J.K., Rinnooy Kan, A.H.G.: Optimization and approximation in deterministic squencing and scheduling: a survey. Ann. Discrete Math. 5, 287–326 (1979)
Article Google Scholar
Grosof, I, Harchol-Balter, M, Scheller-Wolf, A.: Stability for two-class multiserver-job systems (2020). arXiv:2010.00631
Grosof, I., Scully, Z., Harchol-Balter, M.: SRPT for multiserver systems. Perform. Eval. 127–128, 154–175 (2018)
Article Google Scholar
Grosof, I., Scully, Z., Harchol-Balter, M.: Load balancing guardrails: keeping your heavy traffic on the road to low response times. Proc. ACM Meas. Anal. Comput. Syst. (POMACS/SIGMETRICS) 3(2), 1–31 (2019). Article 42
Article Google Scholar
Guo, M., Guan, Q., Ke, W.: Optimal scheduling of VMs in queueing cloud computing systems with a heterogeneous workload. IEEE Access 6, 15178–15191 (2018)
Article Google Scholar
Gupta, A., Acun, B., Sarood, O., Kale, L.: Towards realizing the potential of malleable jobs. In: IEEE International Conference on High Performance Computing (HiPC’14) (2014)
Harchol-Balter, M.: Network analysis without exponentiality assumptions. Ph.D. thesis, University of California at Berkeley (1996)
Harchol-Balter, M.: The effect of heavy-tailed job size distributions on computer system design. In: Proceedings of ASA-IMS Conference on Applications of Heavy Tailed Distributions in Economics, Engineering and Statistics, Washington, DC (1999)
Harchol-Balter, M.: Task assignment with unknown duration. J. ACM 49(2), 260–288 (2002)
Article Google Scholar
Harchol-Balter, M.: Performance Modeling and Design of Computer Systems: Queueing Theory in Action. Cambridge University Press, Cambridge (2013)
Book Google Scholar
Harchol-Balter, M., Crovella, M., Murta, C.: On choosing a task assignment policy for a distributed server system. In: Lecture Notes in Computer Science, No. 1469: 10th International Conference on Modeling Techniques and Tools for Computer Performance Evaluation, pp. 231–242 (1998)
Harchol-Balter, M., Downey, A.: Exploiting process lifetime distributions for dynamic load balancing. In: Proceedings of ACM SIGMETRICS, pp. 13–24, Philadelphia, PA (1996)
Harchol-Balter, M., Downey, A.: Exploiting process lifetime distributions for dynamic load balancing. ACM Trans. Comput. Syst. 15(3), 253–285 (1997)
Article Google Scholar
Harchol-Balter, M., Schroeder, B., Bansal, N., Agrawal, M.: Size-based scheduling to improve web performance. ACM Trans. Comput. Syst. 21(2), 207–233 (2003)
Article Google Scholar
Hill, M.D., Marty, M.R.: Amdahl’s law in the multicore era. Computer 41, 33–38 (2008)
Article Google Scholar
Horvath, G., Horvath, I., Almousa, S.A.-D., Telek, M.: Numerical inverse Laplace transformation using concentrated matrix exponential distributions. Perform. Eval. 137, 1–22 (2019)
Google Scholar
Hunt, P.J., Kurtz, T.G.: Large loss networks. Stoch. Process. Appl. 53(2), 363–378 (1994)
Article Google Scholar
Hyytiä, E., Aalto, S., Penttinen, A.: Minimizing slowdown in heterogeneous size-aware dispatching systems. In: Proceedings of the 2012 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems (2012)
Jonas, E., Pu, Q., Venkataraman, S., Stoica, I., Recht, B.: Occupy the cloud: distributed computing for the 99%. In: Proceedings of the 2017 Symposium on Cloud Computing, pp. 445–451, New York, NY (2017)
Jonas, E., Schleier-Smith, J., Sreekanti, V., Tsai, C., Khandelwal, A., Pu, Q., Shankar, V., Carreira, J., Krauth, K., Yadwadkar, N.J., Gonzalez, J.E., Popa, R.A., Stoica, I., Patterson, D.A.: Cloud programming simplified: a Berkeley view on serverless computing (2019). CoRR, arXiv:1902.03383
Joshi, G., Soljanin, E., Wornell, G.: Efficient replication of queued tasks for latency reduction in cloud systems. In: Allerton Conference on Communication, Control, and Computing, University of Illinois, Urbana-Champaign (2015)
Kim, S.S.L M/M/s queueing system where customers demand multiple server use. Ph.D. thesis, Southern Methodist University (1979)
Lee, K., Shah, N.B., Huang, L., Ramchandran, K.: The MDS queue: analysing the latency performance of erasure codes. IEEE Trans. Inf. Theory 63(5), 2822–2842 (2017)
Google Scholar
Leonardi, S., Raz, D.: Approximating total flow time on parallel machines. In: Proceedings of the Annual ACM Symposium on Theory of Computing (STOC), pp. 110–119 (1997)
Li, H., Groep, D., Wolters, L.: Workload characteristics of a multicluster supercomputer. In: 10th International Conference on Job Scheduling Strategies for Parallel Processing (IPPS’04), pp. 176–193. Springer (2004)
Lin, S.-H., Paolieri, M., Chou, C.F., Golubchik, L.: A model-based approach to streamlining distributed training for asynchronous SGD. In: MASCOTS 2018, pp. 306–318 (2018)
Lu, Y., Xie, Q., Kliot, G., Geller, A., Larus, J.R., Greenberg, A.: Join-idle-queue: a novel load balancing algorithm for dynamically scalable web services. Perform. Eval. 68(11), 1056–1071 (2011)
Article Google Scholar
Madni, S.H.H., Latiff, M.S.A., Abdullahi, M., Abdulhamid, S.M., Usman, M.J.: Performance comparison of heuristic algorithms for task scheduling in IaaS cloud computing environment. PLoS ONE 12(5), 1–26 (2017)
Article Google Scholar
Maguluri, S.T., Srikant, R.: Scheduling jobs with unknown duration in clouds. IEEE/ACM Trans. Netw. 22(6), 1938–1951 (2014)
Article Google Scholar
Maguluri, S.T., Srikant, R., Ying, L.: Stochastic models of load balancing and scheduling in cloud computing clusters. In: Proceedings of IEEE INFOCOM, pp. 702–710 (2012)
Massoulie, L., Roberts, J.W.: Bandwidth sharing and admission control for elastic traffic. Telecommun. Syst. 15, 185–201 (2000)
Article Google Scholar
Melikov, A.: Computation and optimization methods for multiresource queues. Cybern. Syst. Anal. 32(6), 821–836 (1996)
Article Google Scholar
Mok, A.: Fundamental design problems of distributed systems for the hard real-time environment. Ph.D. thesis, MIT, Department of EE and CS (1983)
Morozov, E., Rumyantsev, A.S.: Stability analysis of a MAP/M/s cluster model by matrix-analytic method. In: Fiems, D., Paolieri, M., Platis, A.N. (eds.) Computer Performance Engineering—13th European Workshop, EPEW 2016, Chios, Greece, October 5–7, 2016, Proceedings, volume 9951 of Lecture Notes in Computer Science, pp. 63–76. Springer (2016)
Narlikar, G.J.: Scheduling threads for low space requirement and good locality. Theory Comput. Syst. 35(2), 151–187 (2002)
Article Google Scholar
Nelson, R.D., Tantawi, A.N.: Approximate analysis of fork/join synchronization in parallel queues. IEEE Trans. Comput. 37(6), 739–743 (1988)
Article Google Scholar
Ponomarenko, L., Kim, C.S., Melikov, A.: Performance Analysis and Optimization of Multi-traffic on Communication Networks. Springer, Berlin (2010)
Book Google Scholar
Psychas, K., Ghaderi, J.: On non-preemptive VM scheduling in the cloud. Proc. ACM Meas. Anal. Comput. Syst. 1(2), 1–29 (2017). Article 35
Article Google Scholar
Raaijmakers, Y., Borst, S., Boxma, O.: Delta probing policies for redundancy. Perform. Eval. 127(128), 21–35 (2018)
Article Google Scholar
Raaijmakers, Y., Borst, S., Boxma, O.: Redundancy scheduling with scaled Bernoulli service requirements. Queueing Syst. 93(1–2), 67–82 (2019)
Article Google Scholar
Raaijmakers, Y., Borst, S., Boxma, O.: Stability of redundancy systems with processor sharing. In: Proceedings of the 13th International Conference on Performance Evaluation Methodologies and Tools (VALUETOOLS’20), pp. 120–127 (2020)
Rizk, A., Poloczek, F., Ciucu, F.: Stochastic bounds in fork–join queueing systems under full and partial mapping. Queueing Syst. 83(3), 261–291 (2016)
Article Google Scholar
Rumyantsev, A., Morozov, E.: Stability criterion of a multiserver model with simultaneous service. Ann. Oper. Res. 252(1), 29–39 (2017)
Article Google Scholar
Schrage, L.E.: A proof of the optimality of the shortest remaining processing time discipline. Oper. Res. 16, 678–690 (1968)
Article Google Scholar
Schrage, L.E., Miller, L.W.: The queue M/G/1 with the shortest remaining processing time discipline. Oper. Res. 14, 670–684 (1966)
Article Google Scholar
Schroeder, B., Harchol-Balter, M.: Evaluation of task assignment policies for supercomputing servers: the case for load unbalancing and fairness. Clust. Comput. J. Netw. Softw. Tools Appl. 7(2), 151–161 (2004)
Google Scholar
Scully, Z., Grosof, I., Harchol-Balter, M.: The Gittins policy is nearly optimal in the M/G/k under extremely general conditions. Proc. ACM Meas. Anal. Comput. Syst. (POMACS/SIGMETRICS) 3(4), 1–29 (2020). Article 43
Google Scholar
Scully, Z., Grosof, I., Harchol-Balter, M.: Optimal multiserver scheduling with unknown job sizes in heavy traffic. In: 38th International Symposium on Computer Performance, Modeling, Measurement, and Evaluation (IFIP PERFORMANCE 2020), Milan, Italy (2020)
Scully, Z., Harchol-Balter, M., Scheller-Wolf, A.: SOAP: one clean analysis of all age-based scheduling policies. Proc. ACM Meas. Anal. Comput. Syst. (POMACS/SIGMETRICS) 2(1), 1–30 (2018). Article 16
Article Google Scholar
Scully, Z., Harchol-Balter, M., Scheller-Wolf, A.: Simple near-optimal scheduling for the M/G/1. Proc. ACM Meas. Anal. Comput. Syst. (POMACS/SIGMETRICS) 4(1), 1–29 (2020). Article 11
Article Google Scholar
Shankar, V., Krauth, K., Pu, Q., Jonas, E., Venkataraman, S., Stoica, I., Recht, B., Ragan-Kelley, J.: Numpywren: serverless linear algebra (2018). CoRR, arXiv:1810.09679
Shneer, S., Stolyar, A..: Large-scale parallel server system with multi-component jobs (2020). arXiv:2006.11256
Sigman, K.: Appendix: a primer on heavy-tailed distributions. Queueing Syst. 33(1/3), 261–275 (1999)
Article Google Scholar
Simhadri, H.V., Blelloch, G.E., Fineman, J.T., Gibbons, P.B., Kyrola, A.: Experimental analysis of space-bounded schedulers. In: Proceedings of the 26th ACM Symposium on Parallelism in Algorithms and Architectures (SPAA’14), pp. 30–41, Prague, Czech Republic (2014)
Smith, W.L.: On the distribution of queueing times. Math. Proc. Camb. Philos. Soc. 49(3), 449–461 (1953)
Article Google Scholar
Snyder, B.: Server virtualization has stalled, despite the hype (2010). InfoWorld. https://www.infoworld.com/article/2624771/server-virtualization-has-stalled--despite-the-hype.html. Accessed 15 Nov 2020
Sreekanti, V., Chenggang, W., Lin, X.C., Schleier-Smith, J., Gonzalez, J., Hellerstein, J.M., Tumanov, A.: Cloudburst: stateful functions-as-a-service. Proc. VLDB Endow. 13(11), 2438–2452 (2020)
Article Google Scholar
Sun, Y., Zheng, Z., Koksal, C.E., Kim, K.-H., Shroff, N.B.: Provably delay efficient data retrieving in storage clouds. In: Proceedings of IEEE INFOCOM (2015)
Talbot, A.: The accurate numerical inversion of Laplace transforms. IMA J. Appl. Math. 23(1), 97–120 (1979)
Article Google Scholar
Tang, C., Yu, K., Veeraraghavan, K., Kaldor, J., Michelson, S., Kooburat, T., Anbudurai, A., Clark, M., Gogia, K., Cheng, L., Christensen, B., Gartrell, A., Khutornenko, M., Kulkarni, S., Pawlowski, M., Pelkonen, T., Rodrigues, A., Tibrewal, R., Venkatesan, V., Zhang, P.: Twine: a unified cluster management system for shared infrastructure. In: 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI’20) (2020)
Thomasian, A.: Analysis of fork/join and related queueing systems. ACM Comput. Surv. 47(2), 1–71 (2014)
Article Google Scholar
Tian, H., Zheng, Y., Wang, W.: Characterizing and synthesizing task dependencies of data-parallel jobs in Alibaba cloud. In: 10th ACM Symposium on Cloud Computing (SoCC’19), Santa Cruz, CA (2019)
Tikhonenko, O.M.: Generalized Erlang problem for service systems with finite total capacity. Probl. Inf. Transm. 41(3), 243–253 (2005)
Article Google Scholar
Tirmazi, M., Barker, A., Deng, N., Haque, M.E., Qin, Z.G., Hand, S., Harchol-Balter, M., Wilkes, J.: Borg: the next generation. In: Proceedings of the 15th European Conference on Computer Systems (EuroSys’20), pp. 1–14, Greece (2020)
Trueman, C.: Why data centres are the new frontier in the fight against climate change. Computerworld (2019)
Van Dijk, N.M.: Blocking of finite source inputs which require simultaneous servers with general think and holding times. Oper. Res. Lett. 8(1), 45–52 (1989)
Article Google Scholar
Vandevoorde, M.T., Roberts, E.S.: WorkCrews: an abstraction for controlling parallelism. Int. J. Parallel Program. 17(4), 347–366 (1988)
Article Google Scholar
Verma, A., Pedrosa, L., Korupolu, M., Oppenheimer, D., Tune, E., Wilkes, J.: Large-scale cluster management at Google with Borg. In: Proceedings of the 10th European Conference on Computer Systems, p. 18 (2015)
Wang, D., Joshi, G., Wornell, G.W.: Efficient straggler replication in large-scale parallel computing. Proc. ACM Meas. Model. Comput. Syst. (ACM SIGMETRICS 2019) 4(2), 1–23 (2019). Article 7
Google Scholar
Wang, W., Harchol-Balter, M., Jiang, H., Scheller-Wolf, A., Srikant, R.: Delay asymptotics and bounds for multi-task parallel jobs. Queueing Syst. Theory Appl. 91(3), 207–239 (2019)
Article Google Scholar
Wang, W., Xie, Q., Harchol-Balter, M.: Zero queueing for multi-server jobs (2020). arXiv:2011.10521
Wardley, S.: Why the fuss about serverless? (2016). https://blog.gardeviance.org/2016/11/why-fuss-about-serverless.html. Accessed 15 Nov 2020
Welch, P.D.: On a generalized M/G/1 queueing process in which the first customer of each busy period receives exceptional service. Oper. Res. 12, 736–752 (1964)
Article Google Scholar
Weng, W., Wang, W.: Dispatching parallel jobs to achieve zero queueing delay (2020). arXiv:2004.02081
Whitt, W.: Understanding the efficiency of multi-server service systems. Manag. Sci. 38(5), 708–723 (1992)
Article Google Scholar
Whitt, W.: Blocking when service is required from several facilities simultaneously. AT&T Bell Lab. Tech. J. 64, 1807–1856 (1985)
Article Google Scholar
Wilkes, J.: More Google cluster data. Google research blog (2011). http://googleresearch.blogspot.com/2011/11/more-google-cluster-data.html. Accessed 15 Nov 2020
Wilkes, J.: Google cluster-usage traces v3 (2019). http://github.com/google/cluster-data. Accessed 15 Nov 2020
Xu, Y., Musgrave, Z., Noble, B., Bailey, M.: Bobtail: avoiding long tails in the cloud. In: Proceedings of the 10th USENIX Conference on Networked Systems Design and Implementation (NSDI’13), pp. 329–342, USA (2013)
Zhan, X., Bao, Y., Bienia, C., Li, K.: PARSEC3.0: a multicore benchmark suite with network stacks and SPLASH-2X. ACM SIGARCH Comput. Arch. News 44, 1–16 (2017)
Article Google Scholar
Zhang, W., Fang, V., Panda, A., Shenker, S.: Kappa: A programming framework for serverless computing. In: ACM Symposium on Cloud Computing (SoCC’20), pp. 328–343 (2020)
Zhu, T., Berger, D., Harchol-Balter, M.: SNC-Meister: admitting more tenants with tail latency SLOs. In: ACM Symposium on Cloud Computing (SoCC’16), pp. 374–387, Santa Clara, CA (2016)
Zhu, T., Tumanov, A., Kozuch, M.A.. Harchol-Balter, M., Ganger, G.R.: PriorityMeister: tail latency QoS for shared networked storage. In: ACM Symposium on Cloud Computing 2014 (SoCC’14), pp. 1–14, Seattle, WA (2014)

Download references

Acknowledgements

We would like to thank Sem Borst, Onno Boxma, and Isaac Grosof for their helpful suggestions and careful proof-reading.

Funding

Funding was provided by National Science Foundation (Grant numbers CMMI-1938909, CSR-1763701, XPS-1629444) and Google (Grant number 2020 Faculty Research Award).

Author information

Authors and Affiliations

Computer Science Department, Carnegie Mellon University, Pittsburgh, PA, 15213, USA
Mor Harchol-Balter

Authors

Mor Harchol-Balter
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mor Harchol-Balter.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This work was supported by: NSF-CMMI-1938909, NSF-CSR-1763701, NSF-XPS-1629444, and a Google 2020 Faculty Research Award.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Harchol-Balter, M. Open problems in queueing theory inspired by datacenter computing. Queueing Syst 97, 3–37 (2021). https://doi.org/10.1007/s11134-020-09684-6

Download citation

Received: 01 December 2020
Revised: 01 December 2020
Accepted: 04 December 2020
Published: 27 January 2021
Issue Date: February 2021
DOI: https://doi.org/10.1007/s11134-020-09684-6

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Open problems in queueing theory inspired by datacenter computing

Abstract

Access this article

Similar content being viewed by others

A Queueing Model that Works Only on the Biggest Jobs

Queueing Networks

A broad view of queueing theory through one issue

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

Open problems in queueing theory inspired by datacenter computing

Abstract

Access this article

Similar content being viewed by others

A Queueing Model that Works Only on the Biggest Jobs

Queueing Networks

A broad view of queueing theory through one issue

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation