Queueing Systems

, Volume 91, Issue 3–4, pp 207–239 | Cite as

Delay asymptotics and bounds for multitask parallel jobs

  • Weina WangEmail author
  • Mor Harchol-Balter
  • Haotian Jiang
  • Alan Scheller-Wolf
  • R. Srikant


We study delay of jobs that consist of multiple parallel tasks, which is a critical performance metric in a wide range of applications such as data file retrieval in coded storage systems and parallel computing. In this problem, each job is completed only when all of its tasks are completed, so the delay of a job is the maximum of the delays of its tasks. Despite the wide attention this problem has received, tight analysis is still largely unknown since analyzing job delay requires characterizing the complicated correlation among task delays, which is hard to do. We first consider an asymptotic regime where the number of servers, n, goes to infinity, and the number of tasks in a job, \(k^{(n)}\), is allowed to increase with n. We establish the asymptotic independence of any \(k^{(n)}\) queues under the condition \(k^{(n)}= o(n^{1/4})\). This greatly generalizes the asymptotic independence type of results in the literature, where asymptotic independence is shown only for a fixed constant number of queues. As a consequence of our independence result, the job delay converges to the maximum of independent task delays. We next consider the non-asymptotic regime. Here, we prove that independence yields a stochastic upper bound on job delay for any n and any \(k^{(n)}\) with \(k^{(n)}\le n\). The key component of our proof is a new technique we develop, called “Poisson oversampling.” Our approach converts the job delay problem into a corresponding balls-and-bins problem. However, in contrast with typical balls-and-bins problems where there is a negative correlation among bins, we prove that our variant exhibits positive correlation.


Large systems Asymptotic independence Association of random variables Parallel jobs 

Mathematics Subject Classification

68M20 60K25 



This work was supported in part by National Science Foundation Grants CPS ECCS-1739189, ECCS-1609370, XPS-1629444, and CMMI-1538204, the US Army Research Office (ARO Grant No. W911NF-16-1-0259), the US Office of Naval Research (ONR Grant No. N00014-15-1-2169), DTRA under the Grant Number HDTRA1-16-0017, and a 2018 Faculty Award from Microsoft. Additionally, Haotian Jiang was supported in part by the Department of Physics at Tsinghua University.


  1. 1.
    Baccelli, F.: Two parallel queues created by arrivals with two demands: the M/G/2 symmetrical case. Technical Report RR-0426, INRIA (1985)Google Scholar
  2. 2.
    Baccelli, F., Makowski, A.M.: Simple computable bounds for the fork–join queue. Technical Report RR-0394, INRIA (1985)Google Scholar
  3. 3.
    Baccelli, F., Makowski, A.M., Shwartz, A.: The fork–join queue and related systems with synchronization constraints: stochastic ordering and computable bounds. Adv. Appl. Probab. 21, 629–660 (1989)CrossRefGoogle Scholar
  4. 4.
    Bramson, M., Lu, Y., Prabhakar, B.: Asymptotic independence of queues under randomized load balancing. Queueing Syst. 71(3), 247–292 (2012)CrossRefGoogle Scholar
  5. 5.
    Chen, Y., Alspaugh, S., Katz, R.: Interactive analytical processing in big data systems: a cross-industry study of MapReduce workloads. Proc. VLDB Endow. 5(12), 1802–1813 (2012)CrossRefGoogle Scholar
  6. 6.
    Cox, J.T.: An alternate proof of a correlation inequality of Harris. Ann. Probab. 12(1), 272–273 (1984)CrossRefGoogle Scholar
  7. 7.
    DasGupta, A.: Asymptotic Theory of Statistics and Probability. Springer, Berlin (2008)Google Scholar
  8. 8.
    Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. In: Proceedings of the USENIX Conference Operating Systems Design and Implementation (OSDI), San Francisco, CA, pp. 10–10 (2004)Google Scholar
  9. 9.
    Esary, J.D., Proschan, F., Walkup, D.W.: Association of random variables, with applications. Ann. Math. Stat. 38(5), 1466–1474 (1967)CrossRefGoogle Scholar
  10. 10.
    Farhat, F., Tootaghaj, D., He, Y., Sivasubramaniam, A., Kandemir, M., Das, C.: Stochastic modeling and optimization of stragglers. IEEE Trans. Cloud Comput. (2016) (to be published)Google Scholar
  11. 11.
    Flatto, L., Hahn, S.: Two parallel queues created by arrivals with two demands I. SIAM J. Appl. Math. 44(5), 1041–1053 (1984)CrossRefGoogle Scholar
  12. 12.
    Fortuin, C.M., Kasteleyn, P.W., Ginibre, J.: Correlation inequalities on some partially ordered sets. Commun. Math. Phys. 22(2), 89–103 (1971)CrossRefGoogle Scholar
  13. 13.
    Gardner, K., Harchol-Balter, M., Scheller-Wolf, A.: A better model for job redundancy: decoupling server slowdown and job size. In: IEEE International Symposium Modeling, Analysis and Simulation of Computer and Telecommunication Systems (MASCOTS), London, United Kingdom, pp. 1–10 (2016)Google Scholar
  14. 14.
    Gardner, K., Harchol-Balter, M., Scheller-Wolf, A., Van Houdt, B.: A better model for job redundancy: decoupling server slowdown and job size. IEEE/ACM Trans. Netw. 25(6), 3353–3367 (2017a)CrossRefGoogle Scholar
  15. 15.
    Gardner, K., Harchol-Balter, M., Scheller-Wolf, A., Velednitsky, M., Zbarsky, S.: Redundancy-d: the power of d choices for redundancy. Oper. Res. 65(4), 1078–1094 (2017b)CrossRefGoogle Scholar
  16. 16.
    Graham, C.: Chaoticity on path space for a queueing network with selection of the shortest queue among several. J. Appl. Probab. 37(1), 198–211 (2000)CrossRefGoogle Scholar
  17. 17.
    Graham, C., Méléard, S.: Propagation of chaos for a fully connected loss network with alternate routing. Stoch. Proc. Appl. 44(1), 159–180 (1993)CrossRefGoogle Scholar
  18. 18.
    Harchol-Balter, M.: Performance Modeling and Design of Computer Systems: Queueing Theory in Action, 1st edn. Cambridge University Press, New York (2013)Google Scholar
  19. 19.
    Harris, T.E.: A correlation inequality for Markov processes in partially ordered state spaces. Ann. Probab. 5(3), 451–454 (1977)CrossRefGoogle Scholar
  20. 20.
    Joag-Dev, K., Proschan, F.: Negative association of random variables with applications. Ann. Stat. 11(1), 286–295 (1983)CrossRefGoogle Scholar
  21. 21.
    Joshi, G., Liu, Y., Soljanin, E.: Coding for fast content download. In: Proceedings of the Annual Allerton Conference on Communication, Control and Computing, Monticello, IL, pp. 326–333 (2012)Google Scholar
  22. 22.
    Joshi, G., Soljanin, E., Wornell, G.: Efficient redundancy techniques for latency reduction in cloud systems. ACM Trans. Model. Perform. Eval. Comput. Syst. 2(2), 12:1–12:30 (2017)CrossRefGoogle Scholar
  23. 23.
    Ko, S.S., Serfozo, R.F.: Sojourn times in G/M/1 fork–join networks. Nav. Res. Log. 55(5), 432–443 (2008)CrossRefGoogle Scholar
  24. 24.
    Kumar, A., Shorey, R.: Performance analysis and scheduling of stochastic fork-join jobs in a multicomputer system. IEEE Trans. Parallel Distrib. Syst. 4(10), 1147–1164 (1993)CrossRefGoogle Scholar
  25. 25.
    Lee, K., Shah, N.B., Huang, L., Ramchandran, K.: The MDS queue: analysing the latency performance of erasure codes. IEEE Trans. Inf. Theory 63(5), 2822–2842 (2017)Google Scholar
  26. 26.
    Li, B., Ramamoorthy, A., Srikant, R.: Mean-field-analysis of coding versus replication in cloud storage systems. In: Proceedings IEEE International Conference on Computer Communications (INFOCOM), San Francisco, CA, pp. 1–9 (2016)Google Scholar
  27. 27.
    Liggett, T.M.: Interacting Particle Systems. Springer, Berlin (2005)CrossRefGoogle Scholar
  28. 28.
    Lin, M., Zhang, L., Wierman, A., Tan, J.: Joint optimization of overlapping phases in MapReduce. Perform. Eval. 70(10), 720–735 (2013)CrossRefGoogle Scholar
  29. 29.
    Lu, H., Pang, G.: Heavy-traffic limits for an infinite-server fork–join queueing system with dependent and disruptive services. Queueing Syst. 85(1), 67–115 (2017)CrossRefGoogle Scholar
  30. 30.
    Lui, J.C., Muntz, R.R., Towsley, D.: Computing performance bounds of fork–join parallel programs under a multiprocessing environment. IEEE Trans. Parallel Distrib. Syst. 9(3), 295–311 (1998)CrossRefGoogle Scholar
  31. 31.
    Melamed, B., Whitt, W.: On arrivals that see time averages. Oper. Res. 38(1), 156–172 (1990)CrossRefGoogle Scholar
  32. 32.
    Meyn, S.P., Tweedie, R.L.: Stability of Markovian processes I: criteria for discrete-time chains. Adv. Appl. Probab. 24(3), 542–574 (1992)CrossRefGoogle Scholar
  33. 33.
    Meyn, S.P., Tweedie, R.L.: Stability of Markovian processes III: Foster–Lyapunov criteria for continuous-time processes. Adv. Appl. Probab. 25(3), 518–548 (1993)CrossRefGoogle Scholar
  34. 34.
    Moseley, B., Dasgupta, A., Kumar, R., Sarlós, T.: On scheduling in map-reduce and flow-shops. In: Proceedings of the Annual ACM Symposium Parallelism in Algorithms and Architectures (SPAA), San Jose, CA, pp. 289–298 (2011)Google Scholar
  35. 35.
    Nelson, R., Tantawi, A.N.: Approximate analysis of fork/join synchronization in parallel queues. IEEE Trans. Comput. 37(6), 739–743 (1988)CrossRefGoogle Scholar
  36. 36.
    Nelson, R., Towsley, D., Tantawi, A.N.: Performance analysis of parallel processing systems. IEEE Trans. Softw. Eng. 14(4), 532–540 (1988)CrossRefGoogle Scholar
  37. 37.
    Rizk, A., Poloczek, F., Ciucu, F.: Stochastic bounds in fork–join queueing systems under full and partial mapping. Queueing Syst. 83(3), 261–291 (2016)CrossRefGoogle Scholar
  38. 38.
    Royden, H.L., Fitzpatrick, P.M.: Real Analysis, 4th edn. Pearson, London (2010)Google Scholar
  39. 39.
    Shah, N.B., Lee, K., Ramchandran, K.: When do redundant requests reduce latency? In: Proceedings of the Annual Allerton Conference on Communication, Control and Computing, Monticello, IL, pp. 731–738 (2013)Google Scholar
  40. 40.
    Shah, V., Bouillard, A., Baccelli, F.: Delay comparison of delivery and coding policies in data clusters. In: Proceedings of the Annual Allerton Conference on Communication, Control and Computing, Monticello, IL, pp. 397–404 (2017)Google Scholar
  41. 41.
    Sun, Y., Koksal, C.E., Shroff, N.B.: Near delay-optimal scheduling of batch jobs in multi-server systems. The Ohio State University. Technical Report (2017)Google Scholar
  42. 42.
    Tan, J., Meng, X., Zhang, L.: Delay tails in MapReduce scheduling. In: Proceedings of the ACM SIGMETRICS/PERFORMANCE Jt. International Conference on Measurement and Modeling of Computer Systems, London, United Kingdom, pp. 5–16 (2012)Google Scholar
  43. 43.
    Thomasian, A.: Analysis of fork/join and related queueing systems. ACM Comput. Surv. 47(2), 17:1–17:71 (2014)CrossRefGoogle Scholar
  44. 44.
    Varki, E.: Response time analysis of parallel computer and storage systems. IEEE Trans. Parallel Distrib. Syst. 12(11), 1146–1161 (2001)CrossRefGoogle Scholar
  45. 45.
    Vianna, E., Comarela, G., Pontes, T., Almeida, J., Almeida, V., Wilkinson, K., Kuno, H., Dayal, U.: Analytical performance models for MapReduce workloads. Int. J. Parallel Prog. 41(4), 495–525 (2013)CrossRefGoogle Scholar
  46. 46.
    Vulimiri, A., Michel, O., Godfrey, P.B., Shenker, S.: More is less: reducing latency via redundancy. In: Proceedings of the ACM Workshop Hot Topics in Networks (HotNets), Redmond, WA, pp. 13–18 (2012)Google Scholar
  47. 47.
    Wang, W., Zhu, K., Ying, L., Tan, J., Zhang, L.: MapTask scheduling in MapReduce with data locality: throughput and heavy-traffic optimality. IEEE/ACM Trans. Netw. 24, 190–203 (2016)CrossRefGoogle Scholar
  48. 48.
    Xia, C.H., Liu, Z., Towsley, D., Lelarge, M.: Scalability of fork/join queueing networks with blocking. In: Proceedings of the ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems, San Diego, CA, pp. 133–144 (2007)Google Scholar
  49. 49.
    Xiang, Y., Lan, T., Aggarwal, V., Chen, Y.F.R.: Joint latency and cost optimization for erasure-coded data center storage. IEEE/ACM Trans. Netw. 24(4), 2443–2457 (2016)CrossRefGoogle Scholar
  50. 50.
    Xie, Q., Lu, Y.: Priority algorithm for near-data scheduling: throughput and heavy-traffic optimality. In: Proceedings of the IEEE International Conference on Computer Communications (INFOCOM), Hong Kong, China, pp. 963–972 (2015)Google Scholar
  51. 51.
    Xie, Q., Dong, X., Lu, Y., Srikant, R.: Power of d choices for large-scale bin packing: a loss model. In: Proceedings of the ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems, Portland, OR, pp. 321–334 (2015)Google Scholar
  52. 52.
    Ying, L., Srikant, R., Kang, X.: The power of slightly more than one sample in randomized load balancing. In: Proceedings of the IEEE International Conference on Computer Communications (INFOCOM), Kowloon, Hong Kong, pp. 1131–1139 (2015)Google Scholar
  53. 53.
    Zheng, Y., Shroff, N.B., Sinha, P.: A new analytical technique for designing provably efficient MapReduce schedulers. In: Proceedings of the IEEE International Conference on Computer Communications (INFOCOM), Turin, Italy, pp. 1600–1608 (2013)Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  • Weina Wang
    • 1
    • 2
    Email author
  • Mor Harchol-Balter
    • 2
  • Haotian Jiang
    • 3
  • Alan Scheller-Wolf
    • 4
  • R. Srikant
    • 1
  1. 1.Coordinated Science LabUniversity of Illinois at Urbana-ChampaignUrbanaUSA
  2. 2.Computer Science DepartmentCarnegie Mellon UniversityPittsburghUSA
  3. 3.Department of PhysicsTsinghua UniversityBeijingChina
  4. 4.Tepper School of BusinessCarnegie Mellon UniversityPittsburghUSA

Personalised recommendations