Abstract
Datacenter operations today provide a plethora of new queueing and scheduling problems. The notion of a “job” has become more general and multi-dimensional. The ways in which jobs and servers can interact have grown in complexity, involving parallelism, speedup functions, precedence constraints, and task graphs. The workloads are vastly more variable and more heavy-tailed. Even the performance metrics of interest are broader than in the past, with multi-dimensional service-level objectives in terms of tail probabilities. The purpose of this article is to expose queueing theorists to new models, while providing suggestions for many specific open problems of interest, as well as some insights into their potential solution.
Similar content being viewed by others
Notes
In the multiserver job model, we assume FCFS scheduling, which is what is used in datacenters. This is not to be confused with the virtual machine (VM) packing problem, where the literature has focused on packing jobs into VMs based on the number of resources that they request, so as to achieve throughput optimality (see [77, 85, 109, 110, 118]). However, even in the VM packing problem, waste can occur.
In the above example, we are thinking of the job as being run alone on the k servers. If two jobs are time-sharing the same k servers, then the service time of each will double.
If \(k <1\), it is common to assume that \(s(k) = k\), which is consistent with the intuition that if a job is allocated half a server, then it runs at half speed.
Note that SRPT and FCFS are equivalent in the case where all jobs have the same size.
The optimal allocation is derived both for the case where the goal is to minimize mean response time and the case where the goal is to minimize mean slowdown. The slowdown metric is discussed in Sect. 7.1.3.
Gittins becomes SRPT when job sizes are known.
A job’s “rank” is its priority, where lower rank is better, and where ties are broken in FCFS order. Rank is a function of age, but can also depend on a job’s size or class [129].
References
Amazon EC2. http://aws.amazon.com/ec2/. Accessed 15 Nov 2020
Azure Public Dataset (2019). https://github.com/Azure/AzurePublicDataset. Accessed 15 Nov 2020
Google Compute Engine. http://cloud.google.com/products/compute-engine.html. Accessed 15 Nov 2020
Windows Azure. http://www.windowsazure.com/. Accessed 15 Nov 2020
Datacenter Spending (2020). https://www.cbronline.com/news/data-centre-spending. Accessed 15 Nov 2020
Flexera.: State of the Cloud Report (2020). https://www.flexera.com/blog/industry-trends/trend-of-cloud-computing-2020/. Accessed 15 Nov 2020
Aalto, S., Ayesta, U., Righter, R.: On the Gittins index in the M/G/1 queue. Queueing Syst. 63(1), 437–458 (2009)
Aalto, S., Ayesta, U., Righter, R.: Properties of the Gittins index with application to optimal scheduling. Probab. Eng. Inf. Sci. 25(3), 269–288 (2011)
Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., Dean, J., Devin, M., Ghemawat, S., Irving, G., Isard, M., Kudlur, M., Levenberg, J., Monga, R., Moore, S., Murray, D., Steiner, B., Tucker, P., Vasudevan, V., Warden, P., Wicke, M., Yu, Y., Zheng, X..: Tensorflow: a system for large-scale machine learning. In: Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI ’16), pp. 265–283 (2016)
Abate, J., Choudhury, G.L., Whitt, W.: Asymptotics for steady-state tail probabilities in structured Markov queueing models. Stoch. Mod. 10(1), 99–143 (1994)
Abate, J., Choudhury, G.L., Whitt, W.: Waiting-time tail probabilities in queues with long-tail service-time distributions. Queueing Syst. 16, 311–338 (1994)
Abate, J., Choudhury, G.L., Whitt, W.: An introduction to numerical transform inversion and its application to probability models. In: Grassmann, W.K. (ed.) Computational Probability, pp. 257–323. Springer, Boston (2000)
Abate, J., Whitt, W.: A unified framework for numerically inverting Laplace transforms. INFORMS J. Comput. 18(4), 408–421 (2006)
Acar, U., Blelloch, G.E., Blumofe, R.: The data locality of work stealing. Theory Comput. Syst. 35(3), 321–347 (2002)
Afanaseva, L., Bashtova, E., Grishunina, S.: Stability analysis of a multi-server model with simultaneous service and a regenerative input flow. Methodol. Comput. Appl. Probab. 22, 1439–1455 (2020)
Afanaseva, L., Grishunina, S.: Stability conditions for a multiserver queueing system with a regenerative input flow and simultaneous service of a customer by a random number of servers. Queueing Syst. 94, 213–241 (2020)
Agrawal, K., Li, J., Lu, K., Moseley, B.: Scheduling parallel DAG jobs online to minimize average flow time. In: Proceedings of the 27th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA ’16), pp. 176–189 (2016)
Agrawal, K., Li, J., Lu, K., Moseley, B.: Scheduling parallelizable jobs online to minimize the maximum flow time. In: Symposium on Parallel Algorithms and Architectures (SPAA’16), pp. 195–205 (2016)
Agrawal, K., Li, J., Lu, K., Moseley, B.: Scheduling parallelizable jobs online to maximize throughput. In: LATIN 2018: Theoretical Informatics—13th Latin American Symposium, Buenos Aires, Argentina, pp. 755–776 (2018)
Ahmad, N., Greenberg, A.G., Lahiri, P., Maltz, D., Patel, P.K., Sengupta, S., Vaid, K.V.: Distributed load balancer. Google Patents. U.S. Patent App. 12/189,438 (2008)
Anton, E., Ayesta, U., Jonckheere, M., Verloop, I.M..: On the stability of redundancy models (2019). arXiv:1903.04414
Anton, E., Ayesta, U., Jonckheere, M., Verloop, I.M..: Improving the performance of heterogeneous data centers through redundancy (2020). arXiv:2003.01394
Arora, N.S., Blumofe, R.D., Plaxton, C.G.: Thread scheduling for multiprogrammed multiprocessors. In: 10th Annual ACM Symposium on Parallel Algorithms and Architectures, pp. 119–129 (1998)
Arthurs, E., Kaufman, J.: Sizing a message store subject to blocking criteria. In: IFIP Performance Conference, pp. 547–564 (1979)
AWS. Netflix & AWS Lambda Case Study. https://aws.amazon.com/solutions/case-studies/netflix-and-aws-lambda/. Accessed 15 Nov 2020
AWS. Step Functions. https://docs.aws.amazon.com/step-functions/latest/dg/welcome.html. Accessed 15 Nov 2020
Baccelli, F., Foss, S.: Poisson hail on a hot ground. J. Appl. Probab. 48(A), 343–366 (2011)
Baccelli, F., Makowski, A.M.: Simple computable bounds for the fork–join queue. Technical Report RR-0394, INRIA (1985)
Baccelli, F., Makowski, A.M., Shwartz, A.: The fork–join queue and related systems with synchronization constraints: stochastic ordering and computable bounds. Adv. Appl. Probab. 21, 629–660 (1989)
Barroso, L.A., Holzle, U.: The case for energy-proportional computing. Computer 40(12), 33–37 (2007)
Bean, N.G., Gibbens, R.J., Zachary, S.: Asymptotic analysis of single resource loss systems in heavy traffic, with applications to integrated networks. Adv. Appl. Probab. 27(1), 273–292 (1995)
Bekker, R., Borst, S., Núñez-Queija, R.: Performance of TCP-friendly streaming sessions in the presence of heavy-tailed elastic flows. Perform. Eval. 61(2), 143–162 (2005)
Benameur, N., Fredj, S. Ben, Delcoigne, F., Oueslati-Boulahia, S., Roberts, J.W.: Integrated admission control for streaming and elastic traffic. In: International Workshop on Quality of Future Internet Services, pp. 69–81 (2001)
Berg, B., Dorsman, J.-P., Harchol-Balter, M.: Towards optimality in parallel job scheduling. Proc. ACM Meas. Anal. Comput. Syst. (POMACS/SIGMETRICS) 1(2), 1–30 (2017). Article 40
Berg, B., Harchol-Balter, M., Moseley, B., Wang, W., Whitehouse, J.: Optimal resource allocation for elastic and inelastic jobs. In: Proceedings of the 32nd ACM Symposium on Parallelism in Algorithms and Architectures (SPAA’20), pp. 75–87, Philadelphia, PA (2020)
Berg, B., Vesilo, R., Harchol-Balter, M.: heSRPT: Parallel scheduling to minimize mean slowdown. In: 38th International Symposium on Computer Performance, Modeling, Measurement, and Evaluation (IFIP PERFORMANCE 2020), Milan, Italy (2020)
Berger, D., Berg, B., Zhu, T., Sen, S., Harchol-Balter, M.: Robinhood: Tail latency aware caching—dynamic reallocation from cache-rich to cache-poor. In: 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 2018), pp. 195–212, Carlsbad, CA (2018)
Bienia, C., Kumar, S., Singh, J. P., Li, K.: The PARSEC benchmark suite: characterization and architectural implications. In: Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques (PACT’08), pp. 72–81, New York, NY (2008)
Blelloch, G., Gibbons, P., Matias, Y.: Provably efficient scheduling for languages with fine-grained parallelism. J. ACM 46(2), 281–321 (1999)
Blelloch, G.E., Fineman, J.T., Gibbons, P.B., Simhadri, H.V.: Scheduling irregular parallel computations on hierarchical caches. In: Proceedings of the 23rd Annual ACM Symposium on Parallelism in Algorithms and Architectures (SPAA’11), pp. 355–366, San Jose, California (2011)
Blumofe, R.D., Joerg, C.F., Kuszmaul, B.C., Leiserson, C.E., Randall, K.H., Zhou, Y.: Cilk: an efficient multithreaded runtime system. J. Parallel Distrib. Comput. 37(1), 55–69 (1996)
Blumofe, R.D., Leiserson, C.E.: Scheduling multithreaded computations by work stealing. In: IEEE Symposium on Foundations of Computer Science, pp. 356–368 (1994)
Blumofe, R.D., Leiserson, C.E.: Scheduling multithreaded computations by work stealing. J. ACM 46(5), 720–748 (1999)
Blumofe, R.D., Papadopoulos, D.: Hood: a user-level threads library for multiprogrammed multiprocessors. Technical Report, University of Texas at Austin (1999)
Bonald, T., Proutière, A.: On performance bounds for the integration of elastic and adaptive streaming flows. In: Joint International ACM SIGMETRICS/Performance Conference on Measurement and Modeling of Computer Systems, pp. 235–245 (2004)
Borst, S., Núñez-Queija, R., Zwart, B.: Sojourn time asymptotics in processor-sharing queues. Queueing Syst. 53(1–2), 31–51 (2006)
Borst, S.C., Boxma, O.J., Núñez-Queija, R., Zwart, B.: The impact of the service discipline on delay asymptotics. Perform. Eval. 54(2), 175–206 (2003)
Boxma, O.J., Deng, Q., Zwart, B.: Waiting-time asymptotics for the M/G/2 queue with heterogeneous servers. Queueing Syst. 40(1), 5–31 (2002)
Boxma, O.J., Zwart, B.: Tails in scheduling. SIGMETRICS Perform. Eval. Rev. 34(4), 13–20 (2007)
Brill, P.H., Green, L.: Queues in which customers receive simultaneous service from a random number of servers: a system point approach. Manag. Sci. 30(1), 51–68 (1984)
Cera, M.C., Georgiou, Y., Richard, O., Maillard, N., Navaux, P.O.A.: Supporting malleability in parallel architectures with dynamic CPUSETsMapping and dynamic MPI. In: Kant, K., Pemmaraju, S.V., Sivalingam, K.M., Wu, J. (eds.) International Conference on Distributed Computing and Networking (ICDCN’20), pp. 242–257 (2010)
Chowdhury, R.A., Ramachandran, V., Silvestri, F., Blakeley, B.: Oblivious algorithms for multicores and networks of processors. J. Parallel Distrib. Comput. 73(7), 911–925 (2018)
Crovella, M., Harchol-Balter, M., Murta, C.: Task assignment in a distributed system: Improving performance by unbalancing load. In: Proceedings of the ACM SIGMETRICS Joint International Conference on Measurement and Modeling of Computer Systems, pp. 268–269. Poster Session (1998)
Dasylva, A., Srikant, R.: Bounds on the performance of admission control and routing policies for general topology networks with multiple call centers. In: Eighteenth Annual IEEE INFOCOM’99 International Conference on Computer Communications, pp. 505–512 (1999)
Dean, J., Barroso, L.A.: The tail at scale. Commun. ACM 56(2), 74–80 (2013)
Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)
Delimitrou, C., Kozyrakis, C.: Quasar: resource-efficient and QoS-aware cluster management. In: ASPLOS’14, pp. 127–144, Salt Lake City, Utah (2014)
den Iseger, P.: Numerical transform inversion using Gaussian quadrature. Probab. Eng. Inf. Sci. 20, 1–44 (2006)
den Iseger, P., Gruntjes, P., Mandjes, M.: A Wiener–Hopf based approach to numerical computations in fluctuation theory for Lévy processes. Math. Methods Oper. Res. 78(1), 101–118 (2013)
Dubner, H., Abate, J.: Numerical inversion of Laplace transforms by relating them to the finite Fourier cosine transform. J. ACM 15(1), 115–123 (1968)
Fan, Z., Sen, R., Koutris, P., Albarghouthi, A.: Automated tuning of query degree of parallelism via machine learning. In: Proceedings of the 3rd International Workshop on Exploiting Artificial Intelligence Techniques for Data Management (2020)
Filippopoulos, D., Karatza, H.: An M/M/2 parallel system model with pure space sharing among rigid jobs. Math. Comput. Model. 45(5), 491–530 (2007)
Foss, S., Konstantopoulos, T., Mountford, T.: Power law condition for stability of Poisson hail. J. Theor. Probab. 31, 684–704 (2018)
Foss, S., Korshunov, D.: Heavy tails in multi-server queue. Queueing Syst. Theory Pract. 52, 31–48 (2006)
Foss, S., Korshunov, D., Zachary, S.: An Introduction to Heavy-Tailed and Subexponential Distributions, 2nd edn. Springer, New York (2013)
Fouladi, S., Wahby, R.S., Shacklett, B., Balasubramaniam, K.V., Zeng, W., Bhalerao, R., Sivaraman, A., Porter, G., Winstein, K.: Encoding, fast and slow: low-latency video processing using thousands of tiny threads. In: 14th USENIX Symposium on Networked Systems Design and Implementation (NSDI 17), pp. 363–376, Boston, MA (2017)
Frigo, M., Leiserson, C.E., Randall, K.H.: The implementation of the Cilk-5 multithreaded language. In: ACM PLDI, pp. 212–223 (1998)
Gandhi, A., Doroudi, S., Harchol-Balter, M., Scheller-Wolf, A.: Exact analysis of the M/M/k/setup class of Markov chains via Recursive Renewal Reward. Queueing Syst. Theory Appl. 77(2), 177–209 (2014)
Gandhi, A., Gupta, V., Harchol-Balter, M., Kozuch, M.: Optimality analysis of energy-peformance trade-off for server farm management. Perform. Eval. 67(11), 1155–1171 (2010)
Gandhi, A., Harchol-Balter, M., Adan, I.: Server farms with setup costs. Perform. Eval. 67(11), 1123–1138 (2010)
Gandhi, A., Harchol-Balter, M., Raghunathan, R., Kozuch, M.: AutoScale: dynamic, robust capacity management for multi-tier data centers. ACM Trans. Comput. Syst. 30(4), 1–26 (2012)
Gardner, K., Harchol-Balter, M., Scheller-Wolf, A., Van Houdt, B.: A better model for job redundancy: decoupling server slowdown and job size. ACM/IEEE Trans. Netw. 25(6), 3353–3367 (2017)
Gardner, K., Harchol-Balter, M., Scheller-Wolf, A., Velednitsky, M., Zbarsky, S.: Redundancy-d: the power of d choices for redundancy. Oper. Res. 65(4), 1078–1094 (2017)
Gardner, K., Zbarsky, S., Doroudi, S., Harchol-Balter, M., Hyytia, E., Scheller-Wolf, A.: Queueing with redundant requests: exact analysis. Queueing Syst. Theory Appl. 83(3), 227–259 (2016)
Gardner, K., Zbarsky, S., Doroudi, S., Harchol-Balter, M., Hyytiä, E., Scheller-Wolf, A.: Reducing latency via redundant requests: exact analysis. In: ACM Sigmetrics 2015 Conference on Measurement and Modeling of Computer Systems, pp. 347–360 (2015)
Gavish, B., Schweitzer, P.J.: The Markovian queue with bounded waiting time. Manag. Sci. 23(12), 1349–1357 (1977)
Ghaderi, J.: Randomized algorithms for scheduling VMs in the cloud. In: 35th Annual IEEE International Conference on Computer Communications, INFOCOM 2016, San Francisco, CA, USA, April 10–14, 2016, pp. 1–9 (2016)
Gittins, J.C., Glazebrook, K.D., Weber, R.: Multi-armed Bandit Allocation Indices. Wiley, New York (2011)
Glynn, P.W., Whitt, W.: Logarithmic asymptotics for steady-state tail probabilities in a single-server queue. J. Appl. Probab. 31(A), 131–156 (1994)
Goldstein, S.C., Schauser, K.E., Culler, D.E.: Lazy threads: implementing a fast parallel call. J. Parallel Distrib. Comput. 37(1), 5–20 (1996)
Graham, R.L., Lawler, E.L., Lenstra, J.K., Rinnooy Kan, A.H.G.: Optimization and approximation in deterministic squencing and scheduling: a survey. Ann. Discrete Math. 5, 287–326 (1979)
Grosof, I, Harchol-Balter, M, Scheller-Wolf, A.: Stability for two-class multiserver-job systems (2020). arXiv:2010.00631
Grosof, I., Scully, Z., Harchol-Balter, M.: SRPT for multiserver systems. Perform. Eval. 127–128, 154–175 (2018)
Grosof, I., Scully, Z., Harchol-Balter, M.: Load balancing guardrails: keeping your heavy traffic on the road to low response times. Proc. ACM Meas. Anal. Comput. Syst. (POMACS/SIGMETRICS) 3(2), 1–31 (2019). Article 42
Guo, M., Guan, Q., Ke, W.: Optimal scheduling of VMs in queueing cloud computing systems with a heterogeneous workload. IEEE Access 6, 15178–15191 (2018)
Gupta, A., Acun, B., Sarood, O., Kale, L.: Towards realizing the potential of malleable jobs. In: IEEE International Conference on High Performance Computing (HiPC’14) (2014)
Harchol-Balter, M.: Network analysis without exponentiality assumptions. Ph.D. thesis, University of California at Berkeley (1996)
Harchol-Balter, M.: The effect of heavy-tailed job size distributions on computer system design. In: Proceedings of ASA-IMS Conference on Applications of Heavy Tailed Distributions in Economics, Engineering and Statistics, Washington, DC (1999)
Harchol-Balter, M.: Task assignment with unknown duration. J. ACM 49(2), 260–288 (2002)
Harchol-Balter, M.: Performance Modeling and Design of Computer Systems: Queueing Theory in Action. Cambridge University Press, Cambridge (2013)
Harchol-Balter, M., Crovella, M., Murta, C.: On choosing a task assignment policy for a distributed server system. In: Lecture Notes in Computer Science, No. 1469: 10th International Conference on Modeling Techniques and Tools for Computer Performance Evaluation, pp. 231–242 (1998)
Harchol-Balter, M., Downey, A.: Exploiting process lifetime distributions for dynamic load balancing. In: Proceedings of ACM SIGMETRICS, pp. 13–24, Philadelphia, PA (1996)
Harchol-Balter, M., Downey, A.: Exploiting process lifetime distributions for dynamic load balancing. ACM Trans. Comput. Syst. 15(3), 253–285 (1997)
Harchol-Balter, M., Schroeder, B., Bansal, N., Agrawal, M.: Size-based scheduling to improve web performance. ACM Trans. Comput. Syst. 21(2), 207–233 (2003)
Hill, M.D., Marty, M.R.: Amdahl’s law in the multicore era. Computer 41, 33–38 (2008)
Horvath, G., Horvath, I., Almousa, S.A.-D., Telek, M.: Numerical inverse Laplace transformation using concentrated matrix exponential distributions. Perform. Eval. 137, 1–22 (2019)
Hunt, P.J., Kurtz, T.G.: Large loss networks. Stoch. Process. Appl. 53(2), 363–378 (1994)
Hyytiä, E., Aalto, S., Penttinen, A.: Minimizing slowdown in heterogeneous size-aware dispatching systems. In: Proceedings of the 2012 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems (2012)
Jonas, E., Pu, Q., Venkataraman, S., Stoica, I., Recht, B.: Occupy the cloud: distributed computing for the 99%. In: Proceedings of the 2017 Symposium on Cloud Computing, pp. 445–451, New York, NY (2017)
Jonas, E., Schleier-Smith, J., Sreekanti, V., Tsai, C., Khandelwal, A., Pu, Q., Shankar, V., Carreira, J., Krauth, K., Yadwadkar, N.J., Gonzalez, J.E., Popa, R.A., Stoica, I., Patterson, D.A.: Cloud programming simplified: a Berkeley view on serverless computing (2019). CoRR, arXiv:1902.03383
Joshi, G., Soljanin, E., Wornell, G.: Efficient replication of queued tasks for latency reduction in cloud systems. In: Allerton Conference on Communication, Control, and Computing, University of Illinois, Urbana-Champaign (2015)
Kim, S.S.L M/M/s queueing system where customers demand multiple server use. Ph.D. thesis, Southern Methodist University (1979)
Lee, K., Shah, N.B., Huang, L., Ramchandran, K.: The MDS queue: analysing the latency performance of erasure codes. IEEE Trans. Inf. Theory 63(5), 2822–2842 (2017)
Leonardi, S., Raz, D.: Approximating total flow time on parallel machines. In: Proceedings of the Annual ACM Symposium on Theory of Computing (STOC), pp. 110–119 (1997)
Li, H., Groep, D., Wolters, L.: Workload characteristics of a multicluster supercomputer. In: 10th International Conference on Job Scheduling Strategies for Parallel Processing (IPPS’04), pp. 176–193. Springer (2004)
Lin, S.-H., Paolieri, M., Chou, C.F., Golubchik, L.: A model-based approach to streamlining distributed training for asynchronous SGD. In: MASCOTS 2018, pp. 306–318 (2018)
Lu, Y., Xie, Q., Kliot, G., Geller, A., Larus, J.R., Greenberg, A.: Join-idle-queue: a novel load balancing algorithm for dynamically scalable web services. Perform. Eval. 68(11), 1056–1071 (2011)
Madni, S.H.H., Latiff, M.S.A., Abdullahi, M., Abdulhamid, S.M., Usman, M.J.: Performance comparison of heuristic algorithms for task scheduling in IaaS cloud computing environment. PLoS ONE 12(5), 1–26 (2017)
Maguluri, S.T., Srikant, R.: Scheduling jobs with unknown duration in clouds. IEEE/ACM Trans. Netw. 22(6), 1938–1951 (2014)
Maguluri, S.T., Srikant, R., Ying, L.: Stochastic models of load balancing and scheduling in cloud computing clusters. In: Proceedings of IEEE INFOCOM, pp. 702–710 (2012)
Massoulie, L., Roberts, J.W.: Bandwidth sharing and admission control for elastic traffic. Telecommun. Syst. 15, 185–201 (2000)
Melikov, A.: Computation and optimization methods for multiresource queues. Cybern. Syst. Anal. 32(6), 821–836 (1996)
Mok, A.: Fundamental design problems of distributed systems for the hard real-time environment. Ph.D. thesis, MIT, Department of EE and CS (1983)
Morozov, E., Rumyantsev, A.S.: Stability analysis of a MAP/M/s cluster model by matrix-analytic method. In: Fiems, D., Paolieri, M., Platis, A.N. (eds.) Computer Performance Engineering—13th European Workshop, EPEW 2016, Chios, Greece, October 5–7, 2016, Proceedings, volume 9951 of Lecture Notes in Computer Science, pp. 63–76. Springer (2016)
Narlikar, G.J.: Scheduling threads for low space requirement and good locality. Theory Comput. Syst. 35(2), 151–187 (2002)
Nelson, R.D., Tantawi, A.N.: Approximate analysis of fork/join synchronization in parallel queues. IEEE Trans. Comput. 37(6), 739–743 (1988)
Ponomarenko, L., Kim, C.S., Melikov, A.: Performance Analysis and Optimization of Multi-traffic on Communication Networks. Springer, Berlin (2010)
Psychas, K., Ghaderi, J.: On non-preemptive VM scheduling in the cloud. Proc. ACM Meas. Anal. Comput. Syst. 1(2), 1–29 (2017). Article 35
Raaijmakers, Y., Borst, S., Boxma, O.: Delta probing policies for redundancy. Perform. Eval. 127(128), 21–35 (2018)
Raaijmakers, Y., Borst, S., Boxma, O.: Redundancy scheduling with scaled Bernoulli service requirements. Queueing Syst. 93(1–2), 67–82 (2019)
Raaijmakers, Y., Borst, S., Boxma, O.: Stability of redundancy systems with processor sharing. In: Proceedings of the 13th International Conference on Performance Evaluation Methodologies and Tools (VALUETOOLS’20), pp. 120–127 (2020)
Rizk, A., Poloczek, F., Ciucu, F.: Stochastic bounds in fork–join queueing systems under full and partial mapping. Queueing Syst. 83(3), 261–291 (2016)
Rumyantsev, A., Morozov, E.: Stability criterion of a multiserver model with simultaneous service. Ann. Oper. Res. 252(1), 29–39 (2017)
Schrage, L.E.: A proof of the optimality of the shortest remaining processing time discipline. Oper. Res. 16, 678–690 (1968)
Schrage, L.E., Miller, L.W.: The queue M/G/1 with the shortest remaining processing time discipline. Oper. Res. 14, 670–684 (1966)
Schroeder, B., Harchol-Balter, M.: Evaluation of task assignment policies for supercomputing servers: the case for load unbalancing and fairness. Clust. Comput. J. Netw. Softw. Tools Appl. 7(2), 151–161 (2004)
Scully, Z., Grosof, I., Harchol-Balter, M.: The Gittins policy is nearly optimal in the M/G/k under extremely general conditions. Proc. ACM Meas. Anal. Comput. Syst. (POMACS/SIGMETRICS) 3(4), 1–29 (2020). Article 43
Scully, Z., Grosof, I., Harchol-Balter, M.: Optimal multiserver scheduling with unknown job sizes in heavy traffic. In: 38th International Symposium on Computer Performance, Modeling, Measurement, and Evaluation (IFIP PERFORMANCE 2020), Milan, Italy (2020)
Scully, Z., Harchol-Balter, M., Scheller-Wolf, A.: SOAP: one clean analysis of all age-based scheduling policies. Proc. ACM Meas. Anal. Comput. Syst. (POMACS/SIGMETRICS) 2(1), 1–30 (2018). Article 16
Scully, Z., Harchol-Balter, M., Scheller-Wolf, A.: Simple near-optimal scheduling for the M/G/1. Proc. ACM Meas. Anal. Comput. Syst. (POMACS/SIGMETRICS) 4(1), 1–29 (2020). Article 11
Shankar, V., Krauth, K., Pu, Q., Jonas, E., Venkataraman, S., Stoica, I., Recht, B., Ragan-Kelley, J.: Numpywren: serverless linear algebra (2018). CoRR, arXiv:1810.09679
Shneer, S., Stolyar, A..: Large-scale parallel server system with multi-component jobs (2020). arXiv:2006.11256
Sigman, K.: Appendix: a primer on heavy-tailed distributions. Queueing Syst. 33(1/3), 261–275 (1999)
Simhadri, H.V., Blelloch, G.E., Fineman, J.T., Gibbons, P.B., Kyrola, A.: Experimental analysis of space-bounded schedulers. In: Proceedings of the 26th ACM Symposium on Parallelism in Algorithms and Architectures (SPAA’14), pp. 30–41, Prague, Czech Republic (2014)
Smith, W.L.: On the distribution of queueing times. Math. Proc. Camb. Philos. Soc. 49(3), 449–461 (1953)
Snyder, B.: Server virtualization has stalled, despite the hype (2010). InfoWorld. https://www.infoworld.com/article/2624771/server-virtualization-has-stalled--despite-the-hype.html. Accessed 15 Nov 2020
Sreekanti, V., Chenggang, W., Lin, X.C., Schleier-Smith, J., Gonzalez, J., Hellerstein, J.M., Tumanov, A.: Cloudburst: stateful functions-as-a-service. Proc. VLDB Endow. 13(11), 2438–2452 (2020)
Sun, Y., Zheng, Z., Koksal, C.E., Kim, K.-H., Shroff, N.B.: Provably delay efficient data retrieving in storage clouds. In: Proceedings of IEEE INFOCOM (2015)
Talbot, A.: The accurate numerical inversion of Laplace transforms. IMA J. Appl. Math. 23(1), 97–120 (1979)
Tang, C., Yu, K., Veeraraghavan, K., Kaldor, J., Michelson, S., Kooburat, T., Anbudurai, A., Clark, M., Gogia, K., Cheng, L., Christensen, B., Gartrell, A., Khutornenko, M., Kulkarni, S., Pawlowski, M., Pelkonen, T., Rodrigues, A., Tibrewal, R., Venkatesan, V., Zhang, P.: Twine: a unified cluster management system for shared infrastructure. In: 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI’20) (2020)
Thomasian, A.: Analysis of fork/join and related queueing systems. ACM Comput. Surv. 47(2), 1–71 (2014)
Tian, H., Zheng, Y., Wang, W.: Characterizing and synthesizing task dependencies of data-parallel jobs in Alibaba cloud. In: 10th ACM Symposium on Cloud Computing (SoCC’19), Santa Cruz, CA (2019)
Tikhonenko, O.M.: Generalized Erlang problem for service systems with finite total capacity. Probl. Inf. Transm. 41(3), 243–253 (2005)
Tirmazi, M., Barker, A., Deng, N., Haque, M.E., Qin, Z.G., Hand, S., Harchol-Balter, M., Wilkes, J.: Borg: the next generation. In: Proceedings of the 15th European Conference on Computer Systems (EuroSys’20), pp. 1–14, Greece (2020)
Trueman, C.: Why data centres are the new frontier in the fight against climate change. Computerworld (2019)
Van Dijk, N.M.: Blocking of finite source inputs which require simultaneous servers with general think and holding times. Oper. Res. Lett. 8(1), 45–52 (1989)
Vandevoorde, M.T., Roberts, E.S.: WorkCrews: an abstraction for controlling parallelism. Int. J. Parallel Program. 17(4), 347–366 (1988)
Verma, A., Pedrosa, L., Korupolu, M., Oppenheimer, D., Tune, E., Wilkes, J.: Large-scale cluster management at Google with Borg. In: Proceedings of the 10th European Conference on Computer Systems, p. 18 (2015)
Wang, D., Joshi, G., Wornell, G.W.: Efficient straggler replication in large-scale parallel computing. Proc. ACM Meas. Model. Comput. Syst. (ACM SIGMETRICS 2019) 4(2), 1–23 (2019). Article 7
Wang, W., Harchol-Balter, M., Jiang, H., Scheller-Wolf, A., Srikant, R.: Delay asymptotics and bounds for multi-task parallel jobs. Queueing Syst. Theory Appl. 91(3), 207–239 (2019)
Wang, W., Xie, Q., Harchol-Balter, M.: Zero queueing for multi-server jobs (2020). arXiv:2011.10521
Wardley, S.: Why the fuss about serverless? (2016). https://blog.gardeviance.org/2016/11/why-fuss-about-serverless.html. Accessed 15 Nov 2020
Welch, P.D.: On a generalized M/G/1 queueing process in which the first customer of each busy period receives exceptional service. Oper. Res. 12, 736–752 (1964)
Weng, W., Wang, W.: Dispatching parallel jobs to achieve zero queueing delay (2020). arXiv:2004.02081
Whitt, W.: Understanding the efficiency of multi-server service systems. Manag. Sci. 38(5), 708–723 (1992)
Whitt, W.: Blocking when service is required from several facilities simultaneously. AT&T Bell Lab. Tech. J. 64, 1807–1856 (1985)
Wilkes, J.: More Google cluster data. Google research blog (2011). http://googleresearch.blogspot.com/2011/11/more-google-cluster-data.html. Accessed 15 Nov 2020
Wilkes, J.: Google cluster-usage traces v3 (2019). http://github.com/google/cluster-data. Accessed 15 Nov 2020
Xu, Y., Musgrave, Z., Noble, B., Bailey, M.: Bobtail: avoiding long tails in the cloud. In: Proceedings of the 10th USENIX Conference on Networked Systems Design and Implementation (NSDI’13), pp. 329–342, USA (2013)
Zhan, X., Bao, Y., Bienia, C., Li, K.: PARSEC3.0: a multicore benchmark suite with network stacks and SPLASH-2X. ACM SIGARCH Comput. Arch. News 44, 1–16 (2017)
Zhang, W., Fang, V., Panda, A., Shenker, S.: Kappa: A programming framework for serverless computing. In: ACM Symposium on Cloud Computing (SoCC’20), pp. 328–343 (2020)
Zhu, T., Berger, D., Harchol-Balter, M.: SNC-Meister: admitting more tenants with tail latency SLOs. In: ACM Symposium on Cloud Computing (SoCC’16), pp. 374–387, Santa Clara, CA (2016)
Zhu, T., Tumanov, A., Kozuch, M.A.. Harchol-Balter, M., Ganger, G.R.: PriorityMeister: tail latency QoS for shared networked storage. In: ACM Symposium on Cloud Computing 2014 (SoCC’14), pp. 1–14, Seattle, WA (2014)
Acknowledgements
We would like to thank Sem Borst, Onno Boxma, and Isaac Grosof for their helpful suggestions and careful proof-reading.
Funding
Funding was provided by National Science Foundation (Grant numbers CMMI-1938909, CSR-1763701, XPS-1629444) and Google (Grant number 2020 Faculty Research Award).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This work was supported by: NSF-CMMI-1938909, NSF-CSR-1763701, NSF-XPS-1629444, and a Google 2020 Faculty Research Award.
Rights and permissions
About this article
Cite this article
Harchol-Balter, M. Open problems in queueing theory inspired by datacenter computing. Queueing Syst 97, 3–37 (2021). https://doi.org/10.1007/s11134-020-09684-6
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11134-020-09684-6